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Identification of virulence associated regions RD1 and RD5 leading to improve 
vaccine of M. bovis BCG and M microti 



5 Virulence associated regions have been sought for a long time in Mycobacterium. The 
present invention concerns the identification of 2 genomic regions which are shown to 
be associated with a virulent phenotype in Mycobacteria and particularly in M 
tuberculosis. It concerns also the fragments of said regions. 



10 One of these two regions are known as RD5 as disclosed in Molecular Microbiology 
(1999), vol. 32, pages 643 to 655 (Gordon S.V. et aL). The other region named RD1-2F9 
spans the known region RDl as disclosed in Molecular Microbiology (1999), vol. 32, 
pages 643 to 655 (Gordon S.V. et aL). Both of the regions RDl and RD5 or at least one 
of them are absent from the vaccine strains of M bovis BCG and in M. microti, strains 

15 found involved and used as live vaccines in the 1960's. 



Other applications which are encompassed by the present invention are related to the use 
of all or part of the said regions to detect virulent strains of Mycobacteria, and 
particularly M tuberculosis in humans and animals. The region RD1-2F9 and RD5 are 
20 considered as virulence markers under the present invention. 
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The recombinant Mycobacteria and particularly M. bovis BCG after modification of their 
genome by introduction of all or part of RD1-2F9 region and/or RD5 region in said 
genome can be used for the immune system of patients affected with a cancer as for 
example a bladder cancer. 

5 

The present invention relates to a strain of M. bovis BCG or M. microti, wherein said 
strain has integrated all or part of the region RD1-2F9 responsible for enhanced 
immunogenicity to the tubercle bacilli, especially the genes encoding the ESAT-6 and 
CFP-10 antigenes. These strains will be referred to as the M. bovis BCG::RD1 or M. 
10 microti::BDl strains and are useful as a new improved vaccine for prevention of 
tuberculosis infections and for treating superficial bladder cancer. 

Mycobacterium bovis BCG (bacille Calmette-Guerin) has been used since 1921 to 
prevent tuberculosis although it is of limited efficacy against adult pulmonary disease in 

15 highly endemic areas. Mycobacterium microti, another member of the Mycobacterium 
tuberculosis complex, was originally described as the infective agent of a tuberculosis- 
like disease in voles (Microtus agrestis) 'm the 1930's (Wells, A. Q. 1937. Tuberculosis 
in wild voles. Lancet 1221 and Wells, A. Q. 1946. The murine type of tubercle bacillus. 
Medical Research council special report series 259:1-42.). Until recently, M. microti 

20 strains were thought to be pathogenic only for voles, but not for humans and some were 
even used as a live-vaccine. In fact, the vole bacillus proved to be safe and effective in 
preventing clinical tuberculosis in a trial involving roughly 10,000 adolescents in the UK 
in the 1950*s (Hart, P. D. a., and I. Sutherland. 1977. BCG and vole bacillus vaccines in 
the prevention of tuberculosis in adolescence and early adult fife. British Medical 

25 Journal 2:293-295). At about the same time, another strain, OV166, was successfully 
administered to half a million newborns in Prague, former Czechoslovakia, without any 
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serious complications (Sula, L., and I. Radkovsky. 1976. Protective effects of M. microti 
vaccine against tuberculosis. J. Hyg. Epid. Microbiol. Immunol. 20:1-6). M. microti 
vaccination has since been discontinued because it was no more effective than the 
frequently employed BCG vaccine. As a result, improved vaccines are needed for 
5 preventing and treating tuberculosis. 

The problem for attempting to ameliorate this live vaccine is that the molecular 
mechanism of both the attenuation and the immunogenicity of BCG is still poorly 
understood. Comparative genomic studies of all six members of the M tuberculosis 
complex have identified more than 140 genes, whose presence is facultative, that may 

10 confer differences in phenotype, host range and virulence. Relative to the genome of the 
paradigm strain, M. tuberculosis H37Rv (S. T. Cole, et al, Nature 393, 537 (1998)), 
many of these genes occur in chromosomal regions that have been deleted from certain 
species (RD1-16, RvDl-5), M. A. Behr, et al., Science 284, 1520 (1999) ; R. Brosch, et 
al., Infection hnmun. 66, 2221 (1998) ; S. V. Gordon, et al., Molec Microbiol 32, 643 

15 (1999) ; H. Salamon, et al, Genome Res 10, 2044 (2000), G. G. Mahairas et al, J. 
Bacteriol. 178, 1274 (1996) and R. Brosch, et al., Proc Natl Acad Sci USA 99, 3684 
(2002). 

In connection with the invention and based on their distribution among tubercle bacilli 
and potential to encode virulence functions, RD1, RD3-5, RD7 and RD9 (Fig. 1A, B) 

20 were accorded highest priority for functional genomic analysis using "knock-ins" of M. 
bovis BCG to assess their potential contribution to the attenuation process. Clones 
spanning these RD regions were selected from an ordered M. tuberculosis H37Rv library 
of integrating shuttle cosmids (S. T. Cole, et al, Nature 393, 537 (1998) and W. R. 
Bange, et al, Tuber. Lung Dis. 79, 171 (1999);, and individually electroporated into BCG 

25 Pasteur, where they inserted stably into the attB site (M. H. Lee, et al, Proc. Natl. Acad 
Sci USA 88, 31 11 (1991);. 



WO 03/085098 




PCT/IB03/01789 



4 

We have uncovered that only reintroduction of all or part of RD1-2F9 led to profound 
phenotypic alteration. Strikingly, the BCG::RD1 "knock-in" grew more vigorously than 
BCG controls in immuno-deficient mice, inducing extensive splenomegaly and 
granuloma formation. 

5 RD1 is restricted to the avirulent strains M bovis BCG and M microti. Although the 
endpoints are not identical, the deletions have removed from both vaccine strains a 
cluster of six genes (Rv3871-Rv3876) that are part of the ESAT-6 locus (Fig. 1A (S. T. 
Cole, et al, Nature 393, 537 (1998) and F. Tekaia, et al, Tubercle Lung Disease 79, 329 
(1999)). 

10 Among the missing products are members of the mycobacterial PE (Rv3872), PPE 
(Rv3873), and ESAT-6 (Rv3874, Rv3875) protein families. Despite lacking obvious 
secretion signals, ESAT-6 (Rv3875) and the related protein CFP-10 (Rv3874), are 
abundant components of short-term culture filtrate, acting as immunodominant T-cell 
antigens that induce potent Thl responses (F. Tekaia, et al., Tubercle Lung Disease 79, 

15 329 (1999) ; A. L. Sorensen, et al, Infect. Immun. 63, 1710 (1995) and R. Colangelli, et 
al., Infect. Immun. 68, 990 (2000);. 

In summary, we have discovered that the restoration of RD1-2F9 to M. bovis BCG leads 
to increased persistence in immunocompetent mice. The M bovis BCG::RD1 strain 
20 induces RD1 -specific immune responses of the Thl -type, has enhanced immunogenicity 
and confers better protection than M bovis BCG alone in animal models of tuberculosis. 
The M. bovis BCG::RD1 vaccine is significantly more virulent than M bovis BCG in 
immunodeficient mice but considerably less virulent than M. tuberculosis. 

25 In addition, we show that M microti lacks a different but overlapping part of the RD1 
region (RDl mic ) to M bovis BCG and our results indicate that reintroduction of RD1- 
2F9 confers increased virulence of BCG ::RDI in immunodeficient mice. The rare 
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strains of M microti that are associated with human disease contain a region referred to 
as RD5 mic whereas those from voles do not. 

M. bovis BCG vaccine could be improved by reintroducing other genes encoding ESAT- 
6 family members that have been lost, notably, those found in the RD8 and RD5 loci of 
M tuberculosis. These regions also code for additional T-cell antigens. 

M. bovis BCG::RD1 could be improved by reintroducing the RD8 and RD5 loci of M 
tuberculosis. 

M. bovis BCG vaccine could be improved by reintroducing and overexpressing the genes 
contained in the RD1, RD5 and RD8 regions. 



Accordingly, these new strains, showing greater persistence and enhanced 
immunogenicity, represent an improved vaccine for preventing tuberculosis and treating 
15 bladder cancer. 



In addition, the greater persistence of these recombinant strains is an advantage for the 
presentation of other antigens, for instance from HTV in humans and in order to induce 
protection immune responses. Those improved strains may also be of use in veterinary 
20 medicine, for instance in preventing bovine tuberculosis. 



Description 



Therefore, the present invention is aimed at a strain of M. bovis BCG or M. microti, 
wherein said strain has integrated all or part of the RD1-2F9 region as shown in SEQ ID 
No 1 responsible for enhanced immunogenicity to the tubercle bacilli. These strains will 
25 be referred to as the M. bovis BCG::RD1 or M. microti::BD 1 strains. 



/ 
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In connection with the invention, "part or all of the RD1-2F9 region" means that the 
strain has integrated a portion of DNA originating from Mycobacterium tuberculosis or 
any virulent member of the Mycobacterium tuberculosis complex (M qfiicanum, M. 
bovis, M. ccmettii), which comprises at least one, two, three, four, five, or more gene(s) 

5 selected from Rv3861 (SEQ ID No 4), Rv3862 (SEQ ID No 5), Rv3863 (SEQ ID No 6), 
Rv3864 (SEQ ID No 7), Rv3865 (SEQ ID No 8), Rv3866 (SEQ ID No 9), Rv3867 (SEQ 
ID No 10), Rv3868 (SEQ ID No 11), Rv3869 (SEQ ID No 12), Rv3870 (SEQ ID No 
13), Rv3871 (SEQ ID No 14), Rv3872 (SEQ ID No 15, mycobacterial PE), Rv3873 
(SEQ ID No 16, PPE), Rv3874 (SEQ ID No 17, CFP-10), Rv3875 (SEQ ID No 18, 

10 ESAT-6), Rv3876 (SEQ ID No 19), Rv3877 (SEQ ID No 20), Rv3878 (SEQ ID No 21), 
Rv3879 (SEQ ID No 22), Rv3880 (SEQ ID No 23), Rv3881 (SEQ ID No 24), Rv3882 
(SEQ ID No 25), Rv3883 (SEQ ID No 26), Rv3884 (SEQ ID No 27) and Rv3885 (SEQ 
ID No 28). The expression "a portion of DNA" means also a nucleotide sequence or a 
nucleic acid or a polynucleotide. The expression "gene" is referred herein as the coding 

15 sequence in frame with its natural promoter as well as the coding sequence which has 
been isolated and framed with an exogenous promoter, for example a promoter capable 
of directing high level of expression of said coding sequence. 

In a specific aspect, the invention relates to a strain of M. bovis BCG or M. microti 
wherein said strain has integrated at least one, two, three or more gene(s) selected from 

20 Rv3867 (SEQ ID No 10), Rv3868 (SEQ ID No 1 1), Rv3869 (SEQ ID No 12), Rv3870 
(SEQ ID No 13),Rv3871 (SEQ ID No 14), Rv3872 (SEQ ID No 15, mycobacterial PE), 
Rv3873 (SEQ ID No 16, PPE), Rv3874 (SEQ ID No 17, CFP-10), Rv3875 (SEQ ID No 
18, ESAT-6), Rv3876 (SEQ ID No 19) and Rv3877 (SEQ ID No 20). 
In an another specific aspect, the invention relates to a strain of M. bovis BCG or M. 

25 microti wherein said strain has integrated at least one, two, three or more gene(s) 
selected from Rv3871 (SEQ ID No 14), Rv3872 (SEQ ID No 15, mycobacterial PE), 
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Rv3873 (SEQ ID No 16, PPE), Rv3874 (SEQ ID No 17, CFP-10), Rv3875 (SEQ ID No 
18, ESAT-6) and Rv3876 (SEQ ID No 19). 

Preferably, a strain according to the invention is one which has integrated a portion of 
DNA originating from Mycobacterium tuberculosis or any virulent member of the 

5 Mycobacterium tuberculosis complex (M qfricanum, M. bovis, M. canettit), which 
comprises at least four genes selected from Rv3861 (SEQ ID No 4), Rv3862 (SEQ ID 
No 5), Rv3863 (SEQ ID No 6), Rv3864 (SEQ ID No 7), Rv3865 (SEQ ID No 8), 
Rv3866 (SEQ ID No 9), Rv3867 (SEQ ID No 10), Rv3868 (SEQ ID No 11), Rv3869 
(SEQ ID No 12), Rv3870 (SEQ ID No 13), Rv3871 (SEQ ID No 14), Rv3872 (SEQ ID 

10 No 15, mycobacterial PE), Rv3873 (SEQ ID No 16, PPE), Rv3874 (SEQ ID No 17, 
CFP-10), Rv3875 (SEQ ID No 18, ESAT-6), Rv3876 (SEQ ID No 19), Rv3877 (SEQ ID 
No 20), Rv3878 (SEQ ID No 21), Rv3879 (SEQ ID No 22), Rv3880 (SEQ ID No 23), 
Rv3881 (SEQ ID No 24), Rv3882 (SEQ ID No 25), Rv3883 (SEQ ID No 26), Rv3884 
(SEQ ID No 27) and Rv3885 (SEQ ID No 28)., provided that it comprises Rv3874 (SEQ 

15 ID No 17, CFP-10) and/or Rv3875 (SEQ ID No 18, ESAT-6).. 

Strains which have integrated a portion of DNA originating from Mycobacterium 
tuberculosis or any virulent member of the Mycobacterium tuberculosis complex (M 
africanum, M. bovis, M. ctf«e/*«>omprising at least Rv3871 (SEQ ID No 14), Rv3875 
(SEQ ID No 18, ESAT-6) and Rv3876 (SEQ ID No 19) or at least Rv3871 (SEQ ID No 
20 14), Rv3875 (SEQ ID No 18, ESAT-6) and Rv3877 (SEQ ID No 20) or at least Rv3871 
(SEQ ID No 14), Rv3875 (SEQ ID No 18, ESAT-6), Rv3876 (SEQ ID No 19) and 
Rv3877 (SEQ ID No 20) are of particular interest. 

The above strains according to the invention may further comprise Rv3874 (SEQ ID No 
17, CFP-10), Rv3872 (SEQ ID No 15, mycobacterial PE) and/or Rv3873 (SEQ ID No 
25 16, PPE). In addition, it may further comprise at least one, two, three or four gene(s) 
selected from Rv3861 (SEQ ID No 4), Rv3862 (SEQ ID No 5), Rv3863 (SEQ ID No 6), 
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Rv3864 (SEQ ID No 7), Rv3865 (SEQ ID No 8), Rv3866 (SEQ ID No 9), Rv3867 (SEQ 
ID No 10), Rv3868 (SEQ ID No 11), Rv3869 (SEQ ID No 12), Rv3870 (SEQ ID No 
13), Rv3878 (SEQ ID No 21), Rv3879 (SEQ ID No 22), Rv3880 (SEQ ID No 23), 
Rv3881 (SEQ ID No 24), Rv3882 (SEQ ID No 25), Rv3883 (SEQ ID No 26), Rv3884 
5 (SEQ ID No 27) and Rv3885 (SEQ ID No 28). 

The invention encompasses strains which have integrated a portion of DNA originating 
from Mycobacterium tuberculosis or any virulent member of the Mycobacterium 
tuberculosis complex (M africanum, M bovis, M canettif), which comprises Rv3875 
(SEQ ID No 18, ESAT-6) or Rv3874 (SEQ ID No 17, CFP-10) or both Rv3875 (SEQ ID 
10 No 18, ESAT-6) and Rv3874 (SEQ ID No 17, CFP-10). 

These genes can be mutated (deletion, insertion or base modification) so as to maintain 
the improved immunogenicity while decreasing the virulence of the strains. Using 
routine procedure, the man skilled in the art can select the M. bovis BCG::RD1 or M 
microtiiiRDl strains, in which a mutated gene has been integrated, showing improved 
immunogenicity and lower virulence. 

We have shown here that introduction of the RD1-2F9 region makes the vaccine strains 
induce a more effective immune response against a challenge with M. tuberculosis. 
However, this first generation of constructs can be followed by other, more fine-tuned 
generations of constructs as the complemented BCG::RD1 vaccine strain also showed a 
more virulent phenotype in severely immunocompromised (SCID) mice. Therefore, the 
BCG::RD1 constructs may be modified so as to be applicable as vaccine strains while 
being safe for immunocompromised individuals.The term "construct" means an 
engineered gene unit, usually involving a gene of interest that has been fused to a 
promoter . 



15 



20 



25 
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In this perspective, the man skilled in the art can adapt the BCG::RD1 strain by the 
conception of BCG vaccine strains that only carry parts of the genes coding for ESAT-6 
or CFP-10 in a mycobacterial expression vector (for example pSM81) under the control 
of a promoter, more particularly an hsp60 promoter. For example, at least one portion of 

5 the esat-6 gene that codes for immunogenic 20-mer peptides of ESAT-6 active as T-cell 
epitopes (Mustafa AS, Oftung F, Amoudy HA, Madi NM, Abal AT, Shaban F, Rosen 
Krands I, & Andersen P. (2000) Multiple epitopes from the Mycobacterium tuberculosis 
ESAT-6 antigen are recognized by antigen-specific human T cell lines. Clin Infect Dis. 
30 Suppl 3:S201-5, peptides PI to P8 are incorporated herein in the description) could 

10 be cloned into this vector and electroporated into BCG, resulting in a BCG strain that 
produces these epitopes. 

Alternatively, the ESAT-6 and CFP-10 encoding genes (for example on plasmid RD1- 
AP34 and or RD1-2F9) could be altered by directed mutagenesis (using for example 
15 QuikChange Site-Directed Mutagenesis Kit from Stratagen) in a way that most of the 
immunogenic peptides of ESAT-6 remain intact, but the biological functionality of 
ESAT-6 is lost. 

This approach could result in a more protective BCG vaccine without increasing the 
virulence of the recombinant BCG strain. 

20 

Therefore, the invention is also aimed at a method for preparing and selecting M. bovis 
BCG or M. microti recombinant strains comprising a step consisting of modifying the M. 
bovis BCG::RD1 or M microti.iFDl strains as defined above by insertion, deletion or 
mutation in the integrated RD1 region, more particularly in the esat-6 or CFP-10 gene, 
25 said method leading to strains that are less virulent for immuno-depressed individuals. 
Together, these methods would allow to explain what causes the effect that we see with 
our BCG::RD1 strain (the presence of additional T-cell epitopes from ESAT-6 and 
CFP10 resulting in increased immunogenicity) or whether the effect is caused by better 
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fitness of the recombinant BCG::RD1 clones resulting in longer exposure time of the 
immune system to the vaccine - or - by a combinatorial effect of both factors. 

In a preferred embodiment, the invention is aimed at the M bovis BCG::RD1 strains, 
5 which have integrated a cosmid herein referred to as the RD1-2F9 and RD1-AP34 
contained in the E. coli strains deposited on April 2, 2002 at the CNCM (Institut Pasteur, 
25, rue du Docteur Roux, 75724 Paris cedex 15, France) under the accession number I- 
2831 and 1-2832 respectively. The RD1-2F9 is a cosmid comprising the portion of the 
Mycobacterium tuberculosis H37Rv genome previously named RD1-2F9 that spans the 
10 RD1 region and contains a gene conferring resistance to Kanamycin. The RD1-AP34 is a 
cosmid comprising a portion of the Mycobacterium tuberculosis H37Rv genome 
containing two genes coding for ESAT-6 and CFP-10 as well as a gene conferring 
resistance to Kanamycin. 

15 The cosmid RD1-AP34 contains a 3909 bp fragment of the M. tuberculosis H37Rv 
genome from region 4350459 bp to 4354367 bp that has been cloned into an integrating 
vector pKint in order to be integrated in the genome of Mycobacterium bovis BCG and 
Mycobacterium microti strains (SEQ ID No 3). The Accession No. of the segment 160 of 
the M tuberculosis H37Rv genome that contains this region is AL022120. 

20 

SEQ ID No 3: 

1 - gaattcccat ccagtgagtt caaggtcaag cggcgccccc ctggccaggc atttctcgtc 
25 . 61 - tcgccagacg gcaaagaggt catccaggcc ccctacatcg agcctccaga agaagtgttc 
121 - gcagcacccc caagcgccgg ttaagattat ttcattgccg gtgtagcagg acccgagctc 
181 - agcccggtaa tcgagttcgg gcaatgctga ccatcgggtt tgtttccggc tataaccgaa 
241 - cggtttgtgt acgggataca aatacaggga gggaagaagt aggcaaatgg aaaaaatgtc 
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301 - acatgatccg atcgctgccg acattggcac gcaagtgagc gacaacgctc tgcacggcgt 
361 - gacggccggc tcgacggcgc tgacgtcggt gaccgggctg gttcccgcgg gggccgatga 
421 - ggtctccgcc caagcggcga cggcgttcac atcggagggc atccaattgc tggcttccaa 
481 - tgcatcggcc caagaccagc tccaccgtgc gggcgaagcg gtccaggacg tcgcccgcac 

5 541 - ctattcgcaa atcgacgacg gcgccgccgg cgtcttcgcc gaataggccc ccaacacatc 
601 - ggagggagtg atcacc atgc tgtggcacgc aatgccaccg g agctaaata ccgcacggct 
661 - gatggccggc gcgggtccgg ctccaatgct tgcggcggcc gcgggatggc agacgctttc 
721 - ggcggctctg gacgctcagg ccgtcgagtt paccgcgcgc ctgaactctc tgggagaagc 
781 - ctggactgga ggtggcagcg acaaggcgct tg cggctgca acgccgatgg tggtctggct 

10 841 - acaaaccgcg tcaacacagg ccaagacccg tgcgatgcag gcgacggcgc aagccgcggc 
901 - atacacccag gccatggcca cgacgccgtc gctgccggag atcgccgcca accacatcac 
961 - ccaggccgtc cttacggcca ccaacttctt cggtatcaac ac gatcccga tcgcgttgac 
1021 - cgagatggat tatttcatcc gtatgtggaa ccaggcagcc ct ggcaatgg aggtctacca 
1081 - ggccgagacc gcggttaaca cgcttttcga gaagctcgag ccgatggcgt cgatccttga 

15 1 141 - tcccggcgcg agccagagca cgacgaaccc gatcttcgga atgccctccc ctggcagctc 
1201 - aacaccggtt ggccagttgc cgccggcggc tacccagacc ctcggcc aac tgggtgagat 
1261 - gagcggcccg atgcagcagc tgacccagcc gctgcagcag gtgacg fcgt tgttcagcca 
1321 - ggtgggcggc accggcggcg gcaacccagc cgacgaggaa gccgcg caga tgggcctgct 
1381 - cggcaccagt ccgctgtcga accatccgct ggctggtgga tcaggcccca gcgcgggcgc 

20 1441 - gggcctgctg cgcgcggagt cgctacctgg cgcaggtggg tcgttgaccc gcacgccgct 
1501 - gatgtctcag ctgatcgaaa agccggttgc cccctcggtg a tgccggcgg ctgctgccgg 
1561 - atcgtcggcg acgggtggcg ccgctccggt gggtgcggga g cgatgggcc agggtgcgca 
1621 - atccggcggc tccaccaggc cgggtctggt cgcgccggca ccgctcgcgc aggagcgtga 
1 68 1 - agaagacgac gaggacgact gggacgaaga ggacgactgg t gagctcccg taatgacaac 

25 1741 - agacttcccg gccacccggg ccggaagact tgccaacatt ttggcgagga aggtaaagag 

1801 - agaaagtagt ccagcatggc agagatgaag accgatgccg ctaccctcgc gcaggaggca 
1861 - ggtaatttcg agcggatctc cggcgacctg aaaacccaga tcgaccaggt ggagtcgacg 
1921 - gcaggttcgt tgcagggcca gtggcgcggc gcggcgggga cggccgccca ggccgcggtg 



12 



1981 - gtgcgcttcc aagaagcagc caataagcag aagcaggaac tcgacgagat ctcgacgaat 
2041 - attcgtcagg ccggcgtcca atactcgagg gccgacgagg agcagcagca ggcgctgtcc 
2101 - tcgcaaatgg gcttctgacc cgctaatacg aaaagaaacg gagcaaaaac atzacazazc 
2161 - agcaztzzaa tttczczzzt atczazzccp czzcaazcgc aatccazzza aatztcaczt 
2221 - ccattcattc cctccttzac zazzzzaazc aztccctza c caazctczca zczzcctzzz 
2281 - zczztazczz ttczzazzcz taccazzztz tccazcaaaa atzzza czcc aczzctaccz 
2341 - azctzaacaa czczctzcaz aacctzzczc zzaczatcaz czaazc czzt cazzcaatzz 
2401 - cttcgaccza azzcaacztc actzzzatzt fcgco taggg caacgccgag ttcgcgtaga 
2461 - atagcgaaac acgggatcgg gcgagttcga ccttccgtcg gtctcgccct ttctcgtgtt 
2521 - tatacgtttg agcgcactct gagaggttgt catggcggcc gactacgaca agctcttccg 
2581 - gccgcacgaa ggtatggaag ctccggacga tatggcagcg cagccgttct tcgaccccag 
2641 - tgcttcgttt ccgccggcgc ccgcatcggc aaacctaccg aagcccaacg gccagactcc 
2701 - gcccccgacg tccgacgacc tgtcggagcg gttcgtgtcg gccccgccgc cgccaccccc 
2761 - acccccacct ccgcctccgc caactccgat gccgatcgcc gcaggagagc cgccctcgcc 
2821 - ggaaccggcc gcatctaaac cacccacacc ccccatgccc atcgccggac ccgaaccggc 
2881 - cccacccaaa ccacccacac cccccatgcc catcgccgga cccgaaccgg ccccacccaa 
2941 - accacccaca cctccgatgc ccatcgccgg acctgcaccc accccaaccg aatcccagtt 
3001 -ggcgcccccc agaccaccga caccacaaac gccaaccgga gcgccgcagc aaccggaatc 
3061 - accggcgccc cacgtaccct cgcacgggcc acatcaaccc cggcgcaccg caccagcacc 
3121 - gccctgggca aagatgccaa tcggcgaacc cccgcccgct ccgtccagac cgtctgcgtc 
3181 - cccggccgaa ccaccgaccc ggcctgcccc ccaacactcc cgacgtgcgc gccggggtca 
3241 - ccgctatcgc acagacaccg aacgaaacgt cgggaaggta gcaactggtc catccatcca 
3301 - ggcgcggctg cgggcagagg aagcatccgg cgcgcagctc gcccccggaa cggagccctc 
3361 - gccagcgccg ttgggccaac cgagatcgta tctggctccg cccacccgcc ccgcgccgac 
3421 - agaacctccc cccagcccct cgccgcagcg caactccggt cggcgtgccg agcgacgcgt 
3481 - ccaccccgat ttagccgccc aacatgccgc ggcgcaacct gattcaatta cggccgcaac 
3541 - cactggcggt cgtcgccgca agcgtgcagc gccggatctc gacgcgacac agaaatcctt 
3601 - aaggccggcg gccaaggggc cgaaggtgaa gaaggtgaag ccccagaaac cgaaggccac 
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3661 - gaagccgccc aaagtggtgt cgcagcgcgg ctggcgacat tgggtgcatg cgttgacgcg 
3721 - aatcaacctg ggcctgtcac ccgacgagaa gtacgagctg gacctgcacg ctcgagtccg 
3781 - ccgcaatccc cgcgggtcgt atcagatcgc cgtcgtcggt ctcaaaggtg gggctggcaa 
3841 - aaccacgctg acagcagcgt tggggtcgac gttggctcag gtgcgggccg accggatcct 
5 3901 - ggctctaga 

pos. 0001-0006 EcoRI-restriction site 

pos. 0286-0583 Rv3872 coding for a PE-Protein (SEQ ID No 1 5) 
pos. 0616-1720 Rv3873 coding for a PPE-Protein ( SEQ ID No 16) 
10 pos. 1816-2115 Rv3874 coding for Culture Filtrat protein lOkD (CFP10) (SEQ ID 
No 17) 

pos. 2151-2435 Rv3875 codine for Early Secreted Antisen Tareet 6kD (ESAT6) (SEQ 
ID No 18) 

pos. 3903-3609 Xbal-restriction site 
15 pos. 18 16-2435 CFP-10 gene + esat-6 gene (SEQ ID No 29). 

These sequences can be completed with the Rv3861 to Rv3871, and Rv3876 to Rv3885 
as referred in Table 1 below. 



Gene 
Name 


Gene 
length 


Protein 
length 


Gene 
type 


Accession 
number in 
NCBI 
Bank 

NC = gene 
NP 

protein 


Loc (kb) in 
M. 

tuberculosis 
H37Rv 


Coordinates in 
Mycobacterium 
tuberculosis 
H37Rv 


Molecular 
mass of 
protein 
(Dalton) 


Description 


Rv3861 


324 


108 


CDS 




4337.95 


4337946 
4338269 


11643.42 


hypothetical 
protein 


Rv3862 
c- 

whiB6 


348 


116 


CDS 




4338.52 


compl 

4338174.. 

4338521 


12792.38 


possible 
transcriptional 
regulatory 
protein whiB- 
like WhiB6 


Rv3863 


1176 


392 


CDS 




4338.85 


4338849.. 
4340024 


41087.44 


hypothetical 
alanine rich 
protein 


Rv3864 


1206 


402 


CDS 




4340.27 


4340270.. 


42068.66 


conserved 
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4341475 




hypothetical 
protein 


Rv3865 


309 


103 


CDS 




4341.57 


4341566.. 
4341874 


10618.01 


conserved 

hypothetical 

protein 


Rv3866 


849 


283 


CDS 




4341.88 


4341880.. 
4342728 


30064.04 


conserved 

hypothetical 

protein 


Rv3867 


549 


183 


CDS 


NC 000962 
NP 218384 


4342.77 


4342767 
4343318 


19945.52 


conserved 
protein 


Rv3868 


1719 


573 


CDS 


NC 000962 
NPJ218385 


43433 


4343311 
4345032 


62425 40 


con qptvpH 

protein 


Rv3869 


1440 


480 


CDS 


NC_000962 
NP 218186 


4345.04 


4345036 


51092.58 


possible 

^VJlIoCI vcu 

membrane 
protein 


Rv3870 


2241 


747 


CDS 


NC 000962 
"MP 718187 


4346.48 


4346478 
414877 1 


80912.76 


possible 

membrane 
protein 


Rv3871 


1773 


591 


CDS 


NC 000962 
NP 218388 


4348.83 


4348824 


64560.65 


conserved 

|J1 IFLC111 


Rv3876 


1998 


666 


CDS 


NC 000962 
NP_218393 


4353.01 


4353007 
4355007 


70644.92 


conserved 
proline and 

alallllJC 11 CI I 

protein 


Rv3877 


1533 


511 


CDS 


NCJ)00962 
"MP 718104 


4355.01 


4355004 

41^6^10 


53981.12 


probable 

transmembrane 
protein 


Rv3878 


840 


280 


CDS 


NC 000962 


4356.69 


4356693.. 
4357532 


27395.23 


conserved 
hypothetical 

ctioiiuic rxirii 

protein 


Rv3879 
c 


2187 


729 


CDS 


NC_000962 


4359.78 


compl. 

4357596.. 

4359782 


74492.13 


hypothetical 
alanine and 
proline rich 
protein 


Rv3880 
c 


345 


115 


CDS 


NCJ)00962 


4360.55 


compl. 

4360202.. 

4360546 


12167.51 


conserved 

hypothetical 

protein 


Rv3881 
c 


1380 


460 


CDS 


NCJ)00962 


4361.92 


compl. 

4360546.. 

4361925 


47593.62 


conserved 
hypothetical 
alanine and 
glycine rich 
protein 


Rv3882 
c 


1386 


462 


CDS 


NC_000962 


4363.42 


compl. 
4362035.. 


50396.58 


possible conser 
ved membrane 
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4363420 




protein 


Rv3883 
c 


1338 


446 


CDS 


NC0Q0962 


4364.76 


compl. 

4363420.. 

4364757 


45085.89 


possible 
secreted 
protease 


Rv3884 
c 


1857 


619 


CDS 


NC 000962 


4366.84 


compl. 

4364982.. 

4366838 


68040.97 


probable 
CBXX/CFQX 
family protein 


Rv3885 
c 


1611 


537 


CDS 


NC 000962 


4368.52 


compl. 

4366911.. 

4368521 


57637.95 


possible 
conserved 
membrane 
protein 



The sequence of the fragment RD1-2F9 (-32 kb) covers the region of the M. 
tuberculosis genome AL123456 from ca 4337 kb to ca. 4369 kb, and also contains the 
5 sequence described in SEQ ID No 1. Therefore, the invention also embraces M bovis 
BCG::RD1 strain and M. microti::BDl strain which have integrated the sequence as 
shown in SEQ ID No 1. 



The above described strains fulfill the aim of the invention which is to provide an 
10 improved tuberculosis vaccine or M. bovis BCG-based prophylactic or therapeutic agent, 
or a recombinant M. microti derivative for these purposes. 

The above described M. bovis BCG::RD1 strains are better tuberculosis vaccines than M. 
bovis BCG. These strains can also be improved by reintroducing other genes found in the 
15 RD8 and RD5 loci of M. tuberculosis or any virulent member of the Mycobacterium 
tuberculosis complex (M qfricanum, M. bovis, M. canettii). These regions code for 
additional T-cell antigens. 

As indicated, overexpressing the genes contained in the RD1, RD5 and RD8 regions by 
means of exogenous promoters is encompassed by the invention. The same applies 
20 regarding M microti.iKDl strains. M. microti strains could also be improved by 
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reintroducing the RD8 locus of M. tuberculosis or any virulent member of the 
Mycobacterium tuberculosis complex (M africanum, M. bovis, M. canettii). 

5 

In a second embodiment, the invention is directed to a cosmid or a plasmid, more 
commonly named vectors, comprising all or part of the RD1-2F9 region originating 
from Mycobacterium tuberculosis or any virulent member of the Mycobacterium 
tuberculosis complex (M africanum, M. bovis, M canettii), said region comprising 

10 at least one, two, three or more gene(s) selected from Rv3861 (SEQ ID No 4), 
Rv3862 (SEQ ID No 5), Rv3863 (SEQ ID No 6), Rv3864 (SEQ ID No 7), Rv3865 
(SEQ ID No 8), Rv3866 (SEQ ID No 9), Rv3867 (SEQ ID No 10), Rv3868 (SEQ ID 
No 11), Rv3869 (SEQ ID No 12), Rv3870 (SEQ ID No 13), Rv3871 (SEQ ID No 
14), Rv3872 (SEQ ID No 15, mycobacterial PE), Rv3873 (SEQ ID No 16, PPE), 

15 Rv3874 (SEQ ID No 17, CFP-10), Rv3875 (SEQ ID No 18, ESAT-6), Rv3876 (SEQ 
ID No 19), Rv3877 (SEQ ID No 20), Rv3878 (SEQ ID No 21), Rv3879 (SEQ ID No 
22), Rv3880 (SEQ ID No 23), Rv3881 (SEQ ID No 24), Rv3882 (SEQ ID No 25), 
Rv3883 (SEQ ID No 26), Rv3884 (SEQ ID No 27) and Rv3885 (SEQ ID No 28). 
The term "vector" refers to a DNA molecule originating from a virus, a bacteria, or 

20 the cell of a higher organism into which another DNA fragment of appropriate size 
can be integrated without loss of the vectors capacity for self-replication; a vector 
introduces foreign DNA into host cells, where it can be reproduced in large 
quantities. Examples are plasmids, cosmids, and veast artifi cial chromosomes; 
vectors are often recombinant molecules containing DNA sequences from several 

25 sources. 



Preferably, a cosmid or a plasmid of the invention comprises a part of the RD1-2F9 
region originating from Mycobacterium tuberculosis or any virulent member of the 
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Mycobacterium tuberculosis complex (M qfiicanum, M. bovis, M canettti), said part 
comprising at least one, two, three or more gene(s) selected from Rv3867 (SEQ ID No 
10), Rv3868 (SEQ ID No 11), Rv3869 (SEQ ID No 12), Rv3870 (SEQ ID No 13), 
Rv3871 (SEQ ID No 14), Rv3872 (SEQ ID No 15, mycobacterial PE), Rv3873 (SEQ ID 
5 No 16, PPE), Rv3874 (SEQ ID No 17, CFP-10), Rv3875 (SEQ ID No 18, ESAT-6), 
Rv3876 (SEQ ID No 19) and Rv3877 (SEQ ID No 20). 

Preferably, a cosmid or a plasmid of the invention comprises a part of the RD1-2F9 
region originating from Mycobacterium tuberculosis or any virulent member of the 
10 Mycobacterium tuberculosis complex (M qfiicanum, M. bovis, M canettti), said part 
comprising at least one, two, three or more gene(s) selected from Rv3872 (SEQ ID No 
15, mycobacterial PE), Rv3873 (SEQ ID No 16, PPE), Rv3874 (SEQ ID No 17, CFP- 
10), Rv3875 (SEQ ID No 1 8, ESAT-6) and Rv3876 (SEQ ID No 19). 

15 Preferably, a cosmid or a plasmid of the invention comprises CFP-10, ESAT-6 or both 
or a part of them. It may also comprise a mutated gene selected CFP-10, ESAT-6 or 
both, said mutated gene being responsible for the improved immunogenicity and 
decreased virulence. 

20 A cosmid or a plasmid as mentioned above may comprise at least four genes selected 
from Rv3861 (SEQ ID No 4), Rv3862 (SEQ ID No 5), Rv3863 (SEQ ID No 6), Rv3864 
(SEQ ID No 7), Rv3865 (SEQ ID No 8), Rv3866 (SEQ ID No 9), Rv3867 (SEQ ID No 
10), Rv3868 (SEQ ID No 11), Rv3869 (SEQ ID No 12), Rv3870 (SEQ ID No 13), 
Rv3871 (SEQ ID No 14), Rv3872 (SEQ ID No 15, mycobacterial PE), Rv3873 (SEQ ID 

25 No 16, PPE), Rv3874 (SEQ ID No 17, CFP-10), Rv3875 (SEQ ID No 18, ESAT-6), 
Rv3876 (SEQ ID No 19), Rv3877 (SEQ ID No 20), Rv3878 (SEQ ID No 21), Rv3879 
(SEQ ID No 22), Rv3880 (SEQ ID No 23), Rv3881 (SEQ ID No 24), Rv3882 (SEQ ID 
No 25), Rv3883 (SEQ ID No 26), Rv3884 (SEQ ID No 27) and Rv3885 (SEQ ID No 
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28), provided that it comprises Rv3874 (SEQ ID No 17, CFP-10) and/or Rv3875 (SEQ 
ID No 18,ESAT-6) 

Advantageously, a cosmid or a plasmid of the invention comprises a portion of DNA 
5 originating from Mycobacterium tuberculosis or any virulent member of the 
Mycobacterium tuberculosis complex (M africanum, M. bovis, M. canettii), which 
comprises at least Rv3871 (SEQ ID No 14), Rv3875 (SEQ ID No 18, ESAT-6) and 
Rv3876 (SEQ ID No 19) or at least Rv3871 (SEQ ID No 14), Rv3875 (SEQ ID No 18, 
ESAT-6) and Rv3877 (SEQ ID No 20) or at least Rv3871 (SEQ ID No 14), Rv3875 
10 (SEQ ID No 1 8, ESAT-6), Rv3876 (SEQ ID No 1 9) and Rv3877 (SEQ ID No 20). 

The above cosmids or plasmids may further comprise Rv3872 (SEQ ID No 15, 
mycobacterial PE) Rv3873 (SEQ ID No 16, PPE) Rv3874 (SEQ ID No 17, CFP-10). It 
may also further comprise at least one, two, three or four gene(s) selected from Rv3861 

15 (SEQ ID No 4), Rv3862 (SEQ ID No 5), Rv3863 (SEQ ID No 6), Rv3864 (SEQ ID No 
7), Rv3865 (SEQ ID No 8), Rv3866 (SEQ ID No 9), Rv3867 (SEQ ID No 10), Rv3868 
(SEQ ID No 1 1), Rv3869 (SEQ ID No 12), Rv3870 (SEQ ID No 13), Rv3878 (SEQ ID 
No 21), Rv3879 (SEQ ID No 22), Rv3880 (SEQ ID No 23), Rv3881 (SEQ ID No 24), 
Rv3882 (SEQ ID No 25), Rv3883 (SEQ ID No 26), Rv3884 (SEQ ID No 27) and 

20 Rv3885 (SEQ ID No 28). 

Two particular cosmids of the invention are the cosmids herein referred as RD1-2F9 and 
RD1-AP34 contained in the E. coli strains deposited at the CNCM (Institut Pasteur, 25, 
rue du Docteur Roux, 75724 Paris cedex 15, France) under the accession number 1-2831 
25 and 1-2832 respectively. 

A particular plasmid or cosmid of the invention is one which has integrated the complete 
RD1-2F9 region as shown in SEQ ID No 1. 
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The invention also relates to the use of these cosmids or plasmids for transforming M. 
bovis BCG or M. microti strains. 

5 As indicated above, these cosmids or plasmids may comprise a mutated gene selected 
from Rv3861 to Rv3885, said mutated gene being responsible for the improved 
immunogenicity and decreased virulence. 

In another embodiment, the invention embraces a pharmaceutical composition 
10 comprising a strain as depicted above and a pharmaceutically acceptable carrier. 

In addition to the strains, these pharmaceutical compositions may contain suitable 
pharmaceutically-acceptable carriers comprising excipients and auxiliaries which 
facilitate processing of the living vaccine into preparations which can be used 
15 pharmaceutically. Further details on techniques for formulation and administration may 
be found in the latest edition of Remington's Pharmaceutical Sciences (Maack 
Publishing Co., Easton, Pa.). 

Preferably, such composition is suitable for oral, intravenous or subcutaneous 
administration. 

20 The determination of the effective dose is well within the capability of those skilled in 
the art. A therapeutically effective dose refers to that amount of active ingredient, i.e the 
number of strains administered, which ameliorates the symptoms or condition. 
Therapeutic efficacy and toxicity may be determined by standard pharmaceutical 
procedures in experimental animals, e.g., ED50 (the dose therapeutically effective in 

25 50% of the population) and LD50 (the dose lethal to 50% of the population). The dose 
ratio of toxic to therapeutic effects is the therapeutic index, and it can be expressed as the 
ratio, LD50/ED50. Pharmaceutical compositions which exhibit large therapeutic indices 
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are preferred. Of course, ED50 is to be modulated according to the mammal to be treated 
or vaccinated. In this regard, the invention contemplates a composition suitable for 
human administration as well as veterinary composition. 

The invention is also aimed at a vaccine comprising a M bovis BCG::RD1 or M 
5 microtkiRDl strain as depicted above and a suitable carrier. This vaccine is especially 
useful for preventing tuberculosis. It can also be used for treating bladder cancer. 

The M bovis BCG::RD1 or M microtivXDl strains are also useful as a carrier for the 
expression and presentation of foreign antigens or molecules of interest that are of 
therapeutic or prophylactic interest. Owing to its greater persistence, BCG::RD1 will 

10 present antigens to the immune system over a longer period thereby inducing stronger, 
more robust immune responses and notably protective responses. Examples of such 
foreign antigens can be found in patents and patent applications US 6,191,270 for 
antigen LSA3, US 6,096,879 and US 5,314,808 for HBV antigens, EP 201,540 for HIV- 
1 antigens, US 5,986,051 for H. pylori antigens and FR 2,744,724 for P. falciparum 

15 MSP-1 antigen. 

The invention also concerns a product comprising a strain as depicted above and at least 
one protein selected from ESAT-6 and CFP-10 or epitope derived thereof for a separate, 
simultaneous or sequential use for treating tuberculosis. 

In still another embodiment, the invention concerns the use of a M. bovis BCG::RD1 or 
20 M. microti::KDl strain as depicted above for preventing or treating tuberculosis. 

It also concerns the use of a M bovis BCG::RD1 or M microtinRDl strain as a powerful 
adjuvant/immunomodulator used in the treatment of superficial bladder cancer. 

The invention also contemplates the identification at the species level of members of the 
25 M tuberculosis complex by means of an RD-based molecular diagnostic test Inclusion 
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of markers for RDl mio and RD5 mic would improve the tests and act as predictors of 
virulence, especially in humans. 

In this regard, the invention concerns a diagnostic kit for the identification at the species 
level of members of the M tuberculosis complex comprising DNA probes and primers 
5 specifically hybridizing to a DNA portion of the RD1 or RD5 region of M tuberculosis \ 
more particularly probes hybridizing under stringent conditions to a gene selected from 
Rv3871 (SEQ ID No 14), Rv3872 (SEQ ID No 15, mycobacterial PE), Rv3873 (SEQ ID 
No 16, PPE), Rv3874 (SEQ ID No 17, CFP-10), Rv3875 (SEQ ID No 18, ESAT-6), and 
Rv3876 (SEQ ID No 19), preferably CFP-10 and ESAT-6. 

10 As used herein, the term "stringent conditions" refers to conditions which permit 
hybridization between the probe sequences and the polynucleotide sequence to be 
detected. Suitably stringent conditions can be defined by, for example, the 
concentrations of salt or formamide in the prehybridization and hybridization solutions, 
or by the hybridization temperature, and are well known in the art. In particular, 

15 stringency can be increased by reducing the concentration of salt, increasing the 
concentration of formamide, or raising the hybridization temperature. The temperature 
range corresponding to a particular level of stringency can be further narrowed by 
calculating the purine to pyrimidine ratio of the nucleic acid of interest and adjusting the 
temperature accordingly. Variations on the above ranges and conditions are well known 

20 in the art. 

Among the preferred primers, we can cite: 

primer esat-6F GTCACGTCCATTCATTCCCT (SEQ ID No 32), 
primer esat-6R ATCCCAGTGACGTTGCCTT) (SEQ ID No 33), 
25 primer RDl™ 0 flanking region F GCAGTGCAAAGGTGCAGATA (SEQ ID No 34), 
primer RDl™ 0 flanking region R GATTGAGACACTTGCCACGA (SEQ ID No 35), 
primer RDS™ flanking region F GAATGCCGACGTCATATCG (SEQ ID No 39)/ 
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primer RD5 mic flanking region R CGGCCACTGAGTTCGATTAT (SEQ ID No 40). 

The present invention covers also the complementary nucleotide sequences of said above 
primers as well as the nucleotide sequences hybridizing under stringent conditions with 
5 them and having at least 20 nucleotides and less than 500 nucleotides. 

Diagnostic kits for the identification at the species level of members of the M. 
tuberculosis complex comprising at least one, two, three or more antibodies directed to 
mycobacterial PE, PPE, CFP-10, ESAT-6, are also embraced by the invention. 

io 

Preferably, such kit comprises antibodies directed to CFP-10 and ESAT-6. 

As used herein, the term "antibody" refers to intact molecules as well as fragments 
thereof, such as Fab, F(ab').sub.2, and Fv, which are capable of binding the epitopic 
15 determinant. Probes or antibodies can be labeled with isotopes, fluorescent or 
phosphorescent molecules or by any other means known in the art. 

The invention also relates to virulence markers associated with RD1 and/or RD5 regions 
of the genome of M tuberculosis or a part of these regions. 

20 

The invention is further detailed below and will be illustrated with the following figures. 
Figure legends 

Figure 1: M. bovis BCG and M. microti have a chromosomal deletion, RD1, 
spanning the cfpl0-esat6 locus. 
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(A) Map of the cjpl0-esat6 region showing the six possible reading frames and the M. 
tuberculosis H37Rv gene predictions. This map is also available at: 
(http^/genolist.pasteur.fr/TubercuList/) . 

The deleted regions are shown for BCG and M microti with their respective H37Rv 
5 genome coordinates, and the extent of the conserved ESAT-6 locus (F. Tekaia, et al., 
Tubercle Lung Disease 79, 329 (1999);, is indicated by the gray bar. 

(B) Table showing characteristics of deleted regions selected for complementation 
analysis. Potential virulence factors and their putative functions disrupted by each 
deletion are shown. The coordinates are for the M. tuberculosis H37Rv genome. 

10 (C) Clones used to complement BCG. Individual clones spanning RD1 regions (RD1- 
1106 and RD1-2F9) were selected from an ordered M. tuberculosis genomic library (R.B. 
unpublished) in pYUB412 (S. T. Cole, et al, Nature 393, 537 (1998) and W. R. Bange, 
F. M. Collins, W. R. Jacobs, Jr., Tuber. Lung Dis. 79, 171 (1999)) and electroporated 
into M. bovis BCG strains, or M. microti. Hygromycin-resistant transformants were 

15 verified using PCR specific for the corresponding genes. pAP35 was derived from RD1- 
2F9 by excision of an AflSL fragment. pAP34 was constructed by subcloning an EcoRL- 
Xbal fragment into the integrative vector pKINT. The ends of each fragment are related 
to the BCG RD1 deletion (shaded box) with black lines and the H37Rv coordinates for 
the other fragment ends given in kilobases. 

20 (D) Immunoblot analysis, using an ESAT-6 monoclonal antibody, of whole cell protein 
extracts from log-phase cultures of (well n°l) H37Rv (S. T. Cole, et al. 9 Nature 393, 537 
(1998)), (n°2) BCG::pYUB412 (M. A. Behr, et al y Science 284, 1520 (1999)), (n°3) 
BCG::RD1-I106 (R. Brosch, et al 9 Infection Immun. 66, 2221 (1998)), (n°4) 
BCG::RD1-2F9 (S. V. Gordon, et al, Molec Microbiol 32, 643 (1999)), (n°5) M bovis 

25 (H. Salamon et al, Genome Res 10, 2044 (2000)), (n°6) Mycobacterium smegmatis (G. 
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G. Mahairas, et al, J. Bacteriol 178, 1274 (1996)), (n°7) M smegmatis:yYUB4l2, and 
(n°8) M smegmatis:: RD1-2F9 (R. Brosch, £?/ a/., Proc Natl Acad Sci USA 99, 3684 
(2002)). 

Figure 2: Complementation of BCG Pasteur with the RD1 region alters the colony 
5 morphology and leads to accumulation of Rv3873 and ESAT-6 in the cell wall. 

(A) Serial dilutions of 3 week old cultures of BCG::pYUB412, BCG::I106 or 
BCG::RD1-2F9 growing on Middlebrook 7H10 agar plates. The white square shows the 
area of the plate magnified in the image to the right. 

(B) Light microscope image at fifty fold magnification of BCG::pYUB412 and 
10 BCG::RD1-2F9 colonies. 5 drops of bacterial suspensions of each strain were spotted 

adjacently onto 7H10 plates and imaged after 10 days growth, illuminating the colonies 
through the agar. 

(Q Immunoblot analysis of different cell fractions of H37Rv obtained from 
http://www.cvmbs.colostate.edu^ using either an anti- 

1 5 ESAT-6 antibody or 

(D) anti-Rv3873 (PPE) rabbit serum. H37Rv and BCG signify whole cell extracts from 
the respective bacteria and Cyt, Mem and CW correspond to the cytosolic, membrane 
and cell wall fractions of M tuberculosis H37Rv. 

Figure 3: Complementation of BCG Pasteur with the RD1 region increases 
20 bacterial persistence and pathogenicity in mice. 

(A) Bacteria in the spleen and lungs of BALB/c mice following intravenous (i.v.) 
infection via the lateral tail vein with 10 6 colony forming units (cfu) of M tuberculosis 
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H37Rv (black) or 10 7 cfu of either BCG::pYUB412 (light grey) or BCG::RD1-I106 
(grey). 

(B) Bacterial persistence in the spleen and lungs of C57BL/6 mice following i.v. 
infection with 10 5 cfu of BCG::pYUB412 Oight grey), BCG::RD1-I106 (middle grey) or 

5 BCG::RD1-2F9 (dark grey). 

(C) Bacterial multiplication after i.v. infection with 10 6 cfu of BCG::pYUB412 (light 
grey) and BCG::RD1-2F9 (grey) in severe combined immunodeficiency mice (SCID). 
For A, B, and C each timepoint is the mean of 3 to 4 mice and the error bars represent 
standard deviations. 

10 (D) Spleens from SCID mice three weeks after i.v. infection with 10 6 cfu of either 
BCG::pYUB412, BCG::RD1-2F9 or BCG::I301 (an RD3 "knock-in", Fig. IB). The 
scale is in cm. 

Figure 4: Immunisation of mice with BCG::RD1 generates marked ESAT-6 specific 
T-cell responses and enhanced protection to a challenge with M tuberculosis. 

15 (A) Proliferative response of splenocytes of C57BL/6 mice immunised subcutaneously 
(s.c.) with 10 6 CFU of BCG::pYUB412 (open squares) or BCG::RD1-2F9 (solid 
squares) to in vitro stimulation with various concentrations of synthetic peptides from 
poliovirus type 1 capsid protein VP1, ESAT-6 or Ag85A (K. Huygen, et al., Infect 
Immun. 62, 363 (1994), L. Brandt, JJmmunol 157, 3527 (1996) and C. Leclerc et al, J. 

20 tfrro/. 65, 711(1991)). 

(B) Proliferation of splenocytes from BCG::RDl-2F9-immunised mice in the absence or 
presence of 10 pg/ml of ESAT-6 1-20 peptide, with or without 1 jig/ml of anti-CD4 
(GK1.5) or anti-CD8 (H35-17-2) monoclonal antibody. Results are expressed as mean 
and standard deviation of 3 H-thymidine incorporation from duplicate wells. 
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(C) Concentration of EFN-y in culture supernatants of splenocytes of C57BL/6 mice 
stimulated for 72 h with peptides or PPD after s.c. or i.v. immunisation with either 
BCG::pYUB412 (middle grey and white) or BCG::RD1-2F9 (light grey and black). 
Mice were inoculated with either 10 6 (white and light grey) or 10 7 (middle grey and 

5 black) cfu. Levels of EFN-y were quantified using a sandwich ELISA (detection limit of 
500 pg/ml) with the mAbs R4-6A2 and biotin-conjugated XMG1.2. Results are 
expressed as the mean and standard deviation of duplicate culture wells. 

(D) Bacterial counts in the spleen and lungs of vaccinated and unvaccinated BALB/c 
mice 2 months after an i.v. challenge with M. tuberculosis H37Rv. The mice were 

10 challenged 2 months after i.v. inoculation with 10 6 cfu of either BCG::pYUB412 or 
BCG::RD1-2F9. Organ homogenates for bacterial enumeration were plated on 7H11 
medium, with or without hygromycin, to differentiate M. tuberculosis from residual 
BCG colonies. Results are expressed as the mean and standard deviation of 4 to 5 mice 
and the levels of significance derived using the Wilcoxon rang sum test. 

15 

Figure 5: Mycobacterium microti strain OV254 BAC map (BAC clones named 
MiXXX, where XXX is the identification number of the clone), overlaid on the M 
tuberculosis H37Rv (BAC clones named RvXXX, where XXX is the identification 
number of the clone) and M. bovis AF2122/97 (BAC clones named MbXXX, where 
20 XXX is the identification number of the clone) BAC maps. The scale bars indicate the 
position on the M. tuberculosis genome. 

Figure 6: Difference in the region 4340-4360 kb between the deletion in BCG RDl 1 " 08 
(A) and in M. microti RDl mic (C) relative, to M tuberculosis H37Rv (B). 



25 



Figure 7: Difference in the region 3121-3127 kb between M. tuberculosis H37Rv (A) 
and M microti OV254 (B). Gray boxes picture the direct repeats (DR), black ones the 
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unique numbered spacer sequences. * spacer sequence identical to the one of spacer 58 
reported by van Embden et al. (42). Note that spacers 33-36 and 20-22 are not shown 
because H37Rv lacks these spacers. 

5 Figure 8: A) Asel PFGE profiles of various M microti strains; Hybridization with a 
radiolabeled B) esat-6 probe; C) probe of the RDl mic flanking region; D)plcA probe. 1. 
M. bovis AF2122/97, 2. M canetti, 3. M. bovis BCG Pasteur, 4. M. tuberculosis H37Rv, 
5. M microti OV254, 6. M. microti Myc 94-2272, 7. M m/crotf B3 type mouse, 8. M 
/w/crotf B4 type mouse , 9. M /m'crotf B2 type llama, 10. M microti Bl type llama, 11. 

10 M /w/crotf ATCC 35782. M: Low range PFGE marker (NEB). 

Figure 9: PCR products obtained from various M microti strains using primers that 
flank the RDl mic region, for amplifying ESAT-6 antigen, that flank the MiD2 region. 1. 
M microti Bl type llama, 2. M microti B4 type mouse, 3. M microti B3 type mouse, 4. 
15 M microti B2 type llama, 5. M. microti ATCC 35782, 6. M microti OV254, 7. M 
microti Myc 94-2272, 8. M tuberculosis H37Rv. 



Figure 10: Map of the M tuberculosis H37Rv RD1 genomic region. Map of the 
fragments used to complement BCG and M microti (black) and the genomic regions 

20 deleted from different mycobacterial strains (grey). The middle part shows key genes, 
putative promoters (P) and transcripts, the various proteins from the RD1 region, their 
sizes (number of amino acid residues), InterPro domains 
(http://www.ebi.ac.uk/interpro/), membership of M tuberculosis protein families from 
TubercuList (httpV/genolist.pasteur.fr/TubercuList/). The dashed lines mark the extent of 

25 the RD1 deletion in BCG, M microti and M tuberculosis clinical isolate MT56 (Brosch, 
R., et al. A new evolutionary scenario for the Mycobacterium tuberculosis complex. Proc Natl Acad Set U 
S A 99, 3684-9. (2002)). M bovis AF2122/97 is shown because it contains a frameshift 
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mutation in Rv3881, a gene flanking the RD1 region of BCG. The fragments are drawn 
to show their ends in relation to the genetic map, unless they extend beyond the genomic 
region indicated. pRDl-2F9, pRDl-1106 and pAP35 are based on pYUB412; pAP34 on 
pKINT; pAP47 and pAP48 on pSM8 1 . 

5 Figure 11: Western blot analysis of various RD1 knock-ins of M bovis BCG and M 
microti. The left panel shows results of immunodetection of ESAT-6, CFP-10 and 
PPE68 (Rv3873) in whole cell lysates (WCL) and culture supernatants of BCG; the 
centre panel displays the equivalent findings from M microti and the right panel 
contains M. tuberculosis H37Rv control samples. Samples from mycobacteria 
10 transformed with the following plasmids were present in lanes: pYUB412 vector 
control; 1, pAP34; 2, pAP35; 3, RD1-I106; 4, RD1-2F9. The positions of the nearest 
molecular weight markers are indicated. 

Figure 12: Analysis of immune responses induced by BCG recombinants. A, The upper 
three panels display the results of splenocyte proliferation assays in response to 
15 stimulation in vitro with a peptide from MalE (negative control), to PPD or to a peptide 
containing an immunodominant CD4-epitope from ESAT-6. B, The lower panel shows 
IFN-y production by splenocytes in response to the same antigens. Symbols indicate the 
nature of the various BCG transformants. Samples were taken from C57BL/6 mice 
immunised subcutaneously. 

20 Figure 13: Further immunological characterization of responses to BCG::RD1-2F9 A, 
Proliferative response of splenocytes of C57BL/6 mice immunised subcutaneously (s.c.) 
with 10 6 CFU of BCG::pYUB412 or BCG::RD1-2F9 to in vitro stimulation with various 
concentrations of synthetic peptides from poliovirus type 1 capsid protein VP1 (negative 
control), ESAT-6 or Ag85A (see Methods for details). B, Proliferation of splenocytes 

25 from BCG::RDl-2F9-immunised mice in the absence or presence of ESAT-6 1-20 
peptide, with or without anti-CD4 or anti-CD8 monoclonal antibody. Results are 
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expressed as mean and standard deviation of 3 H-thymidine incorporation from duplicate 
wells, c, Concentration of IFN-y in culture supernatants of splenocytes of C57BL/6 mice 
stimulated for 72 h with peptides or PPD after s.c. or i.v. immunisation with either 
BCG::pYUB412 or BCG::RD1-2F9. Mice were inoculated with either 10 6 or 10 7 CFU. 
5 Results are expressed as the mean and standard deviation of duplicate culture wells. 

Figure 14: Mouse protection studies. A, Bacterial counts in the spleen and lungs of 
vaccinated and unvaccinated C57BL/6 mice 2 months after an i.v. challenge with M 
tuberculosis H37Rv. The mice were challenged 2 months after i.v. inoculation with 10 6 
cfu of either BCG::pYUB412 or BCG::RD1-2F9. Organ homogenates for bacterial 

10 enumeration were plated on 7H1 1 medium, with or without hygromycin, to differentiate 
M. tuberculosis from residual BCG colonies. Results are expressed as the mean and 
standard deviation of 4 mice. Hatched columns correspond to the cohort of unvaccinated 
mice, while white and black columns correspond to mice vaccinated with 
BCG::pYUB412 and BCG::RD1-2F9, respectively. B, Bacterial counts in the spleen and 

15 lungs of vaccinated and unvaccinated C57BL6 mice after an aerosol challenge with 1000 
CFUs of M. tuberculosis. All mice were treated with antibiotics for three weeks prior to 
infection with M tuberculosis. Data are the mean and SE measured on groups of three 
animals, and differences between groups were analysed using ANOVA (*p<0.05, 
**p<0.01). 

20 Figure 15: Guinea pig protection studies. A, Mean weight gain of vaccinated and 
unvaccinated guinea pigs following aerosol infection with M tuberculosis H37Rv. 
Guinea pigs were vaccinated with either saline (triangles), BCG (squares) or BCG::RD1- 
2F9 (filled circles). The error bars are the standard error of the mean. Each time point 
represents the mean weight of six guinea pigs. For the saline vaccinated group the last 

25 live weight was used for calculating the means as the animals were killed on signs of 
severe tuberculosis which occurred after 50, 59, 71, 72, 93 and 93 days. B, Mean 
bacterial counts in the spleen and lungs of vaccinated and unvaccinated guinea pigs after 
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an aerosol challenge with M tuberculosis H37Rv. Groups of 6 guinea pigs were 
vaccinated subcutaneously with either saline, BCG or BCG::RD1-2F9 and infected 56 
days later. Vaccinated animals were killed 120 days following infection and 
unvaccinated ones on signs of suffering or significant weight loss. The error bars 
5 represent the standard error of the mean of six observations. C, Spleens of vaccinated 
guinea pigs 120 days after infection with M. tuberculosis H37Rv; left, animal 
immunised with BCG; right, animal immunised with BCG::RD1-2F9. 

Figure 16: Diagram of the M.tuberculosis H37Rv genomic region showing a working 
model for biogenesis and export of ESAT-6 proteins. It presents a possible functional 

10 model indicating predicted subcellular localization and potential interactions within the 
mycobacterial cell envelope. Rosetta stone analysis indicates direct interaction between 
proteins Rv3870 and Rv3871,and the sequence similarity between the N-terminal 
domains of Rv3868 and Rv3876 suggests that these putative chaperones might also 
interact. Rv3868 is a member of the AAA-family of ATPases that perform chaperone- 

15 like functions by assisting in the assembly, and disassembly of protein complexes 
(Neuwald, A.F., Aravind, L., Spouge, J.L. & Koonin, E.V. AAA+: A class of chaperone- 
like ATPases associated with the assembly, operation, and disassembly of protein 
complexes. Genome Res 9, 27-43. (1999).). It is striking that many type m secretion 
systems require chaperones for stabilisation of the effector proteins that they secrete and 

20 for prevention of premature protein-protein interactions (Page, A.L. & Parsot, C. 
Chaperones of the type III secretion pathway: jacks of all trades. Mol Microbiol 46, 1-1 L 
(2002).). Thus, Rv3868, and possibly Rv3876,may be required for the folding and/or 
dimerisation of ESAT-6/CFP-10 proteins (Renshaw, P.S., et al Conclusive evidence 
that the major T-cell antigens of the M. tuberculosis complex ESAT-6 and CFP-10 form 

25 a tight, 1:1 complex and characterisation of the structural properties of ESAT-6, CFP-10 
and the ESAT-6-CFP-10 complex: implications for pathogenesis and virulence. J Biol 
Chem 8, 8 (2002).),or even to prevent premature dimerisation. ESAT-6/CFP-10 are 
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predicted to be exported through a transmembrane channel.consisting of at least Rv3870, 
Rv3871, and Rv3877,and possibly Rv3869, in a process catalysed by ATP-hydrolysis. 
Rv3873 (PPE 68) is known to occur in the cell envelope and may also be involved as 
shown herein. 

5 

Example It preparation and assessment of M. bovis B CG;;RP1 strains as a vaccine 
for treating or preventing tuberculosis. 

As mentioned above, we have found that complementation with RD1 was accompanied 
by a change in colonial appearance as the BCG Pasteur "knock-in" strains developed a 

10 strikingly different morphotype (Fig. 2A). The RD1 complemented strains adopted a 
spreading, less-rugose morphology, that is characteristic of M bovis, and this was more 
apparent when the colonies were inspected by light microscopy (Fig. 2B). Maps of the 
clones used are shown (Fig. 1C). These changes were seen following complementation 
with all of the RD1 constructs (Fig. 1C) and on complementing M. microti (data not 

15 shown). Pertinently, Calmette and Guerin (A. Calmette, La vaccination preventive 
contre la tuberculosa (Masson et cie., Paris, 1927); observed a change in colony 
morphology during their initial passaging of M. bovis, and our experiments now 
demonstrate that this change, corresponding to loss of RD1, directly contributed to 
attenuating this virulent strain. The integrity of the cell wall is known to be a key 

20 virulence determinant for M tuberculosis (C. E. Barry, Trends Microbiol 9, 237 (2001);, 
and changes in both cell wall lipids (M. S. Glickman, J. S. Cox, W. R. Jacobs, Jr., Mol 
Cell 5, 717 (2000); and protein (F. X. Berthet et al, Science 282, 759 (1998); have been 
shown to alter colony morphology and diminish persistence in animal models. 



To determine which genes were implicated in these morphological changes, antibodies 
25 recognising three RD1 proteins (Rv3873, CFP10 and ESAT-6) were used in 
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immunocytological and subcellular fractionation analysis. When the different cell 
fractions from M tuberculosis were immunoblotted all three proteins were localized in 
the cell wall fraction (Fig. 2C) though significant quantities of Rv3873, a PPE protein, 
were also detected in the membrane and cytosolic fractions (Fig. 2D). Using 
5 immunogold staining and electron microscopy the presence of ESAT-6 in the envelope 
of M. tuberculosis was confirmed but no alteration in capsular ultrastmcture could be 
detected (data not shown). Previously, CFP-10 and ESAT-6 have been considered as 
secreted proteins (F. X. Berthet et al, Microbiology 144, 3195 (1998); but our results 
suggest that their biological functions are linked directly with the cell wall. 

10 Changes in colonial morphology are often accompanied by altered bacterial virulence. 
Initial assessment of the growth of different BCG::RD1 "knock-ins" in C57BL/6 or 
BALB/c mice following intravenous infection revealed that complementation did not 
restore levels of virulence to those of the reference strain M tuberculosis H37Rv (Fig. 
3A). In longer-term experiments, modest yet significant differences were detected in the 

15 persistence of the BCG::RD1 "knock-ins" in comparison to BCG controls. Following 
intravenous infection of C57BL/6 mice, only the RD1 "knock-ins" were still detectable 
in the lungs after 106 days (Fig. 3B). This difference in virulence between the RD1 
recombinants and the BCG vector control was more pronounced in severe combined 
immunodeficiency (SCID) mice (Fig. 3C). The BCG::RD1-2F9 "knock-in" was 

20 markedly more virulent, as evidenced by the growth rate in lungs and spleen and also by 
an increased degree of splenomegaly (Fig. 3D). Cytological examination revealed 
numerous bacilli, extensive cellular infiltration and granuloma formation. These 
increases in virulence following complementation with the RD1 region, demonstrate that 
the loss of this genomic locus contributed to the attenuation of BCG. 

25 The inability to restore full virulence to BCG Pasteur was not due to instability of our 
constructs nor to the strain used (data not shown). Essentially identical results were 
obtained on complementing BCG Russia, a strain less passaged than BCG Pasteur and 
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presumed, therefore, to be closer to the original ancestor (M. A. Behr, et al, Science 284, 
1520 (1999);. This indicates that the attenuation of BCG was a polymutational process 
and loss of residual virulence for animals was documented in the late 1920s (T. 
Oettinger, et al, Tuber Lung Dis 79, 243 (1999);. Using the same experimental strategy, 
5 we also tested the effects of complementing with RD3-5, RD7 and RD9 (S. T. Cole, et 
al, Nature 393, 537 (1998) ; M. A. Behr, et al, Science 284, 1520 (1999) ; R. Brosch, et 
al, Infection Immun. 66, 2221 (1998) and S. V. Gordon et al, Molec Microbiol 32, 643 
(1999)) encoding putative virulence factors (Fig- IB). Reintroduction of these regions, 
which are not restricted to avirulent strains, did not affect virulence in immuno- 
10 competent mice. Although it is possible that deletion effects act synergistically it seems 
more plausible that other attenuating mechanisms are at play. 

Since RD1 encodes at least two potent T-cell antigens (R. Colangelli, et al, Infect 
Immun. 68, 990 (2000), M. Harboe, et al., Infect Immun. 66, 717 (1998) and R. L. V. 
Skj0t, et al, Infect Immun. 68, 214 (2000)), we investigated whether its restoration 

15 induced immune responses to these antigens or even improved the protective capacity of 
BCG. Three weeks following either intravenous or subcutaneous inoculation with 
BCG::RD1 or BCG controls, we observed similar proliferation of splenocytes to an 
Ag85A (an antigenic BCG protein) peptide (K. Huygen, et al., Infect Immun. 62, 363 
(1994», but not against a control viral peptide (Fig. 4A). Moreover, BCG::RD1 

20 generated powerful CD4 + T-cell responses against the ESAT-6 peptide as shown by 
splenocyte proliferation (Fig. 4A, B) and strong EFN-y production (Fig. 4C). In contrast, 
the BCG::pYUB412 control did not stimulate ESAT-6 specific T-cell responses thus 
indicating that these were mediated by the RD1 locus. ESAT-6 is, therefore, highly 
immunogenic in mice in the context of recombinant BCG. 

25 When used as a subunit vaccine, ESAT-6 elicits T-cell responses and induces levels of 
protection weaker than but akin to those of BCG (L. Brandt et al, Infect. Immun. 68, 791 
(2000». Challenge experiments were conducted to determine if induction of immune 
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responses to BCG::RD1 -encoded antigens, such as ESAT-6, could improve protection 
against infection with M tuberculosis. Groups of mice inoculated with either 
BCG::pYUB412 or BCG::RD1 were subsequently infected intravenously with M 
tuberculosis H37Rv. These experiments showed that immunisation with the BCG::RD1 
5 "knock-in" inhibited the growth of M tuberculosis within both BALB/c (Fig. 4D) and 
C57BL/6 mice when compared to inoculation with BCG alone. 

Although the increases in protection induced by BCG::RD1 and the BCG control are 
modest they demonstrate convincingly that genetic differences have developed between 
the live vaccine and the pathogen which have weakened the protective capacity of BCG. 

10 This study therefore defines the genetic basis of a compromise that has occurred, during 
the attenuation process, between loss of virulence and reduced protection (M. A. Behr, P. 
M. Small, Nature 389, 133 (1997);. The strategy of reintroducing, or even 
overproducing (M. A. Horwitz et al, Proc Natl Acad Sci USA 97, 13853 (2000);, the 
missing immunodominant antigens of M tuberculosis in BCG, could be combined with 

15 an immuno-neutral attenuating mutation to create a more efficacious tuberculosis 
vaccine. 

Example 2: BAC based comparative genomics identifies Myc obacterium microti as a 
natural ESAT-6 deletion mutant 

20 

We searched for any genetic differences between human and vole isolates that might 
explain their different degree of virulence and host preference and what makes the vole 
isolates harmless for humans. In this regard, comparative genomics methods were 
employed in connection with the present invention to identify major differences that may 
25 exist between the M. microti reference strain OV254 and the entirely sequenced strains 
of M tuberculosis H37Rv (10) or M bovis AF2122/97 (14). An ordered Bacterial 
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Artificial Chromosome (BAC) library of M. microti OV254 was constructed and 
individual BAC to BAC comparison of a minimal set of these clones with BAC clones 
from previously constructed libraries of M. tuberculosis H37Rv and M bovis AF2 122/97 
was undertaken. 

5 Ten regions were detected in M. microti that were different to the corresponding 
genomic regions in M tuberculosis and M bovis. To investigate if these regions were 
associated with the ability of M. microti strains to infect humans, their genetic 
organization was studied in 8 additional M microti strains, including those isolated 
recently from patients with pulmonary tuberculosis. This analysis identified some 

10 regions that were specifically absent from all tested M microti strains, but present in all 
other members of the M tuberculosis complex and other regions that were only absent 
from vole isolates of M. microti. 



2.1 MATERIALS AND METHODS 

15 

Bacterial strains and plasmids. M microti OV254 which was originally isolated from 
voles in the UK in the 1930's was kindly supplied by MJ Colston (45). DNA from M 
microti OV216 and OV183 were included in a set of strains used during a multicenter 
study (26). M microti Myc 94-2272 was isolated in 1988 from the perfusion fluid of a 

20 41 -year-old dialysis patient (43) and was kindly provided by L. M. Parsons. M. microti 
35782 was purchased from American Type Culture Collection (designation TMC 1608 
(M.P. Prague)). M. microti Bl type llama, B2 type llama, B3 type mouse and B4 type 
mouse were obtained from the collection of the National Reference Center for 
Mycobacteria, Forschungszentrum Borstel, Germany. M bovis strain AF2 122/97, 

25 spoligotype 9 was responsible for a herd outbreak in Devon in the UK and has been 
isolated from lesions in both cattle and badgers. Typically, mycobacteria were grown on 
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7H9 Middlebrook liquid medium (Difco) containing 10% oleic-acid-dextrose-catalase 
(Difco), 0.2 % pyruvic acid and 0.05% Tween 80. 

Library construction, preparation of BAC DNA and sequencing reactions. 

5 Preparation of agarose-embedded genomic DNA from M microti strain OV254, M 
tuberculosis H37Rv, M bovis BCG was performed as described by Brosch et al. (5). The 
M microti library was constructed by ligation of partially digested HindUI fragments 
(50-125 kb) into pBeloBACll. From the 10,000 clones that were obtained, 2,000 were 
picked into 96 well plates and stored at -80°C. Plasmid preparations of recombinant 

10 clones for sequencing reactions were obtained by pooling eight copies of 96 well plates, 
with each well containing an overnight culture in 250 |il 2YT medium with 12.5 ng.ml" 1 
chloramphenicol. After 5 min centrifugation at 3000 rpm, the bacterial pellets were 
resuspended in 25 \il of solution A (25 mM Tris, pH 8.0, 50 mM glucose and 10 mM 
EDTA), cells were lysed by adding 25 \xl of buffer B (NaOH 0.2 M, SDS 0.2%). TTien 

15 20 fxl of cold 3 M sodium acetate pH 4.8 were added and kept on ice for 30 min. After 
centrifugation at 3000 rpm for 30 min, the pooled supernatants (140 jil) were transferred 
to new plates. 130 \il of isopropanol were added, and after 30 min on ice, DNA was 
pelleted by centrifugation at 3500 rpm for 15 min. The supernatant was discarded and 
the pellet resuspended in 50 \il of a 10 ng/ml RNAse A solution (in Tris 10 mM pH 7.5 / 

20 EDTA 10 mM) and incubated at 64°C for 15 min. After precipitation (2.5 |al of sodium 
acetate 3 M pH 7 and 200 nl of absolute ethanol) pellets were rinsed with 200 \il of 70% 
ethanol, air dried and finally suspended in 20 jal of TE buffer. 

End-sequencing reactions were performed with a Taq DyeDeoxy Terminator cycle 
25 sequencing kit (Applied Biosystems) using a mixture of 13 fil of DNA solution, 2 ^1 of 
Primer (2 jiM) (SP6-BAC1, AGTTAGCTCACTCATTAGGCA (SEQ ID No 15), or T7- 
BAC1, GGATGTGCTGCAAGGCGATTA (SEQ ID No 16)), 2.5 jtl of Big Dye and 2.5 
\l\ of a 5X buffer (50 mM MgCl2, 50 mM Tris). Thermal cycling was performed on a 
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PTC-100 amplifier (MJ Inc.) with an initial denaturation step of 60 s at 95°C, followed 
by 90 cycles of 15 s at 95°C, 15 s at 56°c, 4 min at 60°C. DNA was then precipitated 
with 80 pi of 76% ethanol and centrifuged at 3000 rpm for 30 min. After discarding the 
supernatant, DNA was finally rinsed with 80 ul of 70% ethanol and resuspended in 

5 appropriate buffers depending on the type of automated sequencer used (ABI 377 or ABI 
3700). Sequence data were transferred to Digital workstations and edited using the TED 
software from the Staden package (37). Edited sequences were compared against the M. 
tuberculosis H37Rv database (http://genolist.pasteur.fr/TubercuList/), the M. bovis 
BLAST server (http://www.sanger.ac.Uk/Proiects/M bovis/blast server.shtml ), and in- 

10 house databases to determine the relative positions of the M. microti OV254 BAC end- 
sequences. 

Preparation of BAC DNA from recombinants and BAC digestion profile 
comparison. DNA for digestion was prepared as previously described (4). DNA (1 ug) 

15 was digested with HindQL (Boehringer) and restriction products separated by pulsed-field 
gel electrophoresis (PFGE) on a Biorad CHEF-DR m system using a 1% (w/v) agarose 
gel and a pulse of 3.5 s for 17 h at 6 V.cm" 1 . Low-range PFGE markers (NEB) were used 
as size standards. Insert sizes were estimated after ethidium bromide staining and 
visualization with UV light. Different comparisons were made with overlapping clones 

20 from the M. microti OV254, M. bovis AF2122/97, and M. tuberculosis H37Rv 
pBeloBACll libraries. 

PCR analysis to determine presence of genes in different M. microti strains. 
Reactions contained 5 pi of lOxPCR buffer (100 mM B-mercaptoethanol, 600 mM Tris- 
25 HC1, pH 8.8, 20 mM MgCl 2 , 170 mM (NBLO2SO4, 20 mM nucleotide mix dNTP), 2.5 pi 
of each primer at 2 pM, 10 ng of template DNA, 10% DMSO and 0.5 unit of Taq 
polymerase in a final volume of 12.5 pi. Thermal cycling was performed on a PTC-100 
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amplifier (MJ Inc.) with an initial denaturation step of 90 s at 95°C, followed by 35 
cycles of 45 s at 95°C, 1 min at 60°C and 2 min at 72°C. 

RFLP analysis. In brief, agarose plugs of genomic DNA prepared as previously 

5 described (5) were digested with either Asel, Dral or Xbal (NEB), then electrophoresed 
on a 1% agarose gel, and finally transferred to Hybond-C extra nitrocellulose membranes 
(Amersham). Different probes were amplified by PCR from the M. microti strain OV254 
or M. tuberculosis H37Rv using primers for : 
esat-6 (esat-6F GTCACGTCCATTCATTCCCT (SEQ ID No 17); 

10 esat-6R ATCCCAGTGACGTTGCCTT) (SEQ ID No 1 8), 

the RDl mic flanking region (4340, 209F GCAGTGCAAAGGTGCAGATA (SEQ ID No 
19); 4354,701R GATTGAGACACTTGCCACGA (SEQ ID No 20)), or 
plcA (plcA.int.F CAAGTTGGGTCTGGTCGAAT (SEQ ID No 21); plcA.int.R 
GCTACCCAAGGTCTCCTGGT (SEQ ID No 22)). Amplification products were radio- 

15 labeled by using the Stratagene Prime-It U kit (Stratagene). Hybridizations were 
performed at 65°C in a solution containing NaCl 0.8 M, EDTA pH 8, 5 mM, sodium 
phosphate 50 mM pH 8, 2% SDS, IX Denhardfs reagent and 100 ug/ml salmon sperm 
DNA (Genaxis). Membranes were exposed to phosphorimager screens and images were 
digitalized by using a STORM phospho-imager. 

20 DNA sequence accession numbers. The nucleotide sequences that flank MiDl, MiD2, 
MiD3 as well as the junction sequence of RDl mic have been deposited in the EMBL 
database. Accession numbers are AJ345005, AJ345006, AJ315556 and AJ315557, 
respectively. 

25 2.2 RESULTS 
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DNA into Escherichia coli DH10B yielded about 10,000 recombinant clones, from 
which 2,000 were isolated and stored in 96-well plates. Using the complete sequence of 
the M tuberculosis H37Rv genome as a scaffold, end-sequencing of 384 randomly 
chosen M microti BAC clones allowed us to select enough clones to cover almost all of 

5 the 4.4 Mb chromosome. A few rare clones that spanned regions that were not covered 
by this approach were identified by PCR screening of pools as previously described (4). 
This resulted in a minimal set of 50 BACs, covering over 99.9% of the M. microti . 
OV254 genome, whose positions relative to M. tuberculosis H37Rv are shown in Figure 
5. The insert size ranged between 50 and 125 kb, and the recombinant clones were 

10 stable. Compared with other BAC libraries from tubercle bacilli (4, 13) the M. microti 
OV254 BAC library contained clones that were generally larger than those obtained 
previously, which facilitated the comparative genomics approach, described below. 

Identification of DNA deletions in M microti OV254 relative to M. tuberculosis 
15 H37Rv by comparative genomics. The minimal overlapping set of 50 BAC clones, 
together with the availability of three other ordered BAC libraries from M tuberculosis 
H37Rv, M bovis BCG Pasteur 1173P2 (5, 13) and M bovis AF2122/97 (14) allowed us 
to carry out direct BAC to BAC comparison of clones spanning the same genomic 
regions. Size differences of PFGE-separated HindUl restriction fragments from M 
20 microti OV254 BACs, relative to restriction fragments from M bovis and/or M. 
tuberculosis BAC clones, identified loci that differed among the tested strains. Size 
variations of at least 2 kb were easily detectable and 10 deleted regions, evenly 
distributed around the genome, and containing more than 60 open reading frames 
(ORFs), were identified. These regions represent over 60 kb that are missing from M 
25 microti OV254 strain compared to M tuberculosis H37Rv. First, it was found that 
phiRv2 (RD11), one of the two M. tuberculosis H37Rv prophages was present in M 
microti OV254, whereas phiRvl, also referred to as RD3 (29) was absent. Second, it was 
found that M microti lacks four of the genomic regions that were also absent from M 
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bovis BCG. In fact, these four regions of difference named RD7, RD8, RD9 and RD10 
are absent from all members of the M. tuberculosis complex with the exception of M 
tuberculosis and M. canettii, and seem to have been lost from a common progenitor 
strain of M ajricarmm, M microti and M. bovis (3). As such, our results obtained with 

5 individual BAC to BAC comparisons show that M microti is part of this non-M 
tuberculosis lineage of the tubercle bacilli, and this assumption was further confirmed by 
sequencing the junction regions of RD7 - RD10 in M microti OV254. The sequences 
obtained were identical to those from M. qjricanum, M. bovis and M. bovis BCG strains. 
Apart from these four conserved regions of difference, and phiRvl (RD3) M. microti 

10 OV254 did not show any other RDs with identical junction regions to M. bovis BCG 
Pasteur, which misses at least 17 RDs relative to M. tuberculosis H37Rv (1, 13, 35). 
However, five other regions missing from the genome of M. microti OV254 relative to 
M. tuberculosis H37Rv were identified (RDl mic , RD5 mio , MiDl, MiD2, MiD3). Such 
regions are specific either for strain OV254 or for M microti strains in general. 

15 Interestingly, two of these regions, RDl mic , RD5 mic partially overlap RDs from the M. 
bovis BCG. 

Antigens ESAT-6 and CFP-10 are absent from M. microti. One of the most 
interesting findings of the BAC to BAC comparison was a novel deletion in a genomic 

20 region close to the origin of replication (figure 5). Detailed PCR and sequence analysis 
of this region in M. microti OV254 showed a segment of 14 kb to be missing (equivalent 
to M tuberculosis H37Rv from 4340,4 to 4354,5 kb) that partly overlapped RDl 1 * 8 
absent from M. bovis BCG. More precisely, ORFs Rv3864 and Rv3876 are truncated in 
M. microti OV254 and ORFs Rv3865 to Rv3875 are absent (figure 6). This observation 

25 is particularly interesting as previous comparative genomic analysis identified RDl^ as 
the only RD region that is specifically absent from all BCG sub-strains but present in all 
other members of the M tuberculosis complex (1, 4, 13, 29, 35). As shown in Figure 6, 
in M. microti OV254 the RDl mic deletion is responsible for the loss of a large portion of 
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the conserved ESAT-6 family core region (40) including the genes coding for the major 
T-cell antigens ESAT-6 and CFP-10 (2, 15). The fact that previous deletion screening 
protocols employed primer sequences that were designed for the right hand portion of 
the RDl 1 * 6 region (i.e. gene Rv3878) (6, 39) explains why the RDl mic deletion was not 
5 detected earlier by these investigations. Figure 6 shows that RD l mic does not affect genes 
Rv3877, Rv3878 and Rv3879 which are part of the RDl *** deletion. 

Deletion of phospholipase-C genes in M. microti OV254. RD5 mic , the other region 
absent from M. microti OV254, that partially overlapped an RD region from BCG, was 
10 revealed by comparison of BAC clone Mil8A5 with BAC Rvl43 (figure 5). PCR 
analysis and sequencing of the junction region revealed that RD5 nl,c was smaller than the 
RD5 deletion in BCG (Table 2 and 3 below). 

TABLE 2 

15 Description of the putative function of the deleted and truncated ORFs in M microti OV254 



Region 


Start - End 


overlapping ORF 


Putative Function or family 


RD 10 


264,5-266,5 


Rv0221-Rv0223 


echAl 


RD3 


1779,5-1788,5 


Rvl573-Rvl586 


bacteriophage proteins 


RD 7 


2207,5-2220,5 


Rvl964-Rvl977 


yrbE3A-3B; mce3A-F; unknown 


RD9 


2330-2332 


Rv2072-Rv2075 


cobL; probable oxidoreductase; unknown 


RD5 mic 


2627,6-2633,4 


Rv2348-Rv2352 


pic A-C; member of PPE family 


MiDl 


3121,8-3126,6 


Rv2816-Rv2819 


IS62 1 0 transposase; unknown 


MiD2 


3554,0-3755,2 


Rv3187-Rv3190 


IS6110 transposase; unknown 


MiD3 


3741,1-3755,7 


Rv3345-Rv3349 


members of the PE-PGRS and PPE femilies; insertion 








elements 


RD8 


4056,8-4062,7 


Rv3617-Rv3618 


ephA; IpqG; member of the PE-PGRS family 
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RDimic 4340,4-4354,5 Rv3864-Rv3876 member of the CBXX/CF QX family; member of the PE 

and PPE families; ESAT-6; CFP10; unknown 



5 TABLE 3. Sequence at the junction of the deleted regions in M. microti OV254 



Junction Position ORFs Sequences at the junction 



Flanking primers 



RDl mic 4340,421- 
(SEQ ED 4354,533 
No 23) 



CAAGACGAGGTTGTAAAACCTCGACG 
Rv3864- CAGGATCGGCGATGAAATGCCAGTCG 
Rv3876 GCGTCGCTGAGCGCGCGCTGCGCCOl 

GTCCCA TUTGTCGCTGA TTTGTTTGAA CA 
GCGACGAACCGGTGTTGAAAATGTCGCCT 
GGGTCGGGGATTCCCT 



4340,209F (SEQ ID No 19) 
GCAGTGCAAAGGTGCAGATA 
4354,701R (SEQ ID No 20) 
GATTGAGACACTTGCCACGA 



RDS™ 2627,831- Rv2349- 
(SEQ ID 2635,581 Rv2355 
No 26) 



CCTCGATGAACCACCTGACATGACCC 
CATCCTTTCCAAGAACTGGAGTCTCC 
GGACATGCCGGGGCGGTTCACTGCCC 

CAGGTGTCCTGGGTCGTTCCGTTGACCGT 
CGAGTCCGAACATCCGTCATTCCCGGTGG 
CAGTCGGTGCGGTGAC _ 



2627,370F (SEQ ID No 24) 
GAATGCCGACGTCATATCG 
2633,692R (SEQ ID No 25) 
CGGCCACTGAGTTCGATTAT 



MiDl 3121,880- Rv2815c- 
(SEQ ID 3126,684 Rv2818c 
No 29) 



CACCTGACATGACCCCATCCTTTCCA 
AGAACTGGAGTCTCCGGACATGCCGG 
GGCGGTTCAGGGACATrCATGTCCATCTT 
C7GGCAGATCAGCAGATCGC7TGTTCTCAG 
TGCAGGTGAGTC 



3121,690F (SEQ ID No 27) 
CAGCCAACACCAAGTAGACG 
3126,924R (SEQ ID No 28) 
TCTACCTGCAGTCGCTTGTG 



MiD2 3554,066- Rv3188- 
(SEQ ID 3555,259 Rv3189 
No 32) 



GCTGCCTACTACGCTCAACGCCAGAG 
ACCAGCCGCCGGCTGAGGTCTCAGAT 
CAGAGAGTCTCCGGACTCACCGGGGC 
GGTICXTAAAGGCTTCGAGACCGGACGG 
GCTGTAGGTTCCTCAACTGTGTGGCGGA T 
GGTCTGA GCA CTTAA C 



3553,880F (SEQ ID No 30) 
GTCCATCGAGGATGTCGAGT 
3555,385R (SEQ ID No 31) 
CTAGGCCATTCCGTTGTCTG 



MiD3 3741,139- Rv3345c- 
(SEQ ID 3755,777 Rv3349c 
No 35) 



TGGCGCCGGCACCTCCGTTGCCACCG 

TTGCCGCCGCTGGTGGGCGCGGTGCC 

GTTCGCCCCGGCCGAACCGTTCAGGG 

CCGGGTTCGCCCTCAGCCGCTAAACACG 

CCGACCAAGATCAACGAGCTACCFGCCCG 

GTCAAGGTTGAAGAGCCCCCATATCAGCA 

AGGGCCCGGTGTCGGCG 



3740,950F (SEQ ID No 33) 
GGCGACGCCATTTCC 
3755,988R (SEQ ID No 34) 
AACTGTCGGGCTTGCTCTT 



In fact, M. microti OV254 lacks the genes plcA, plcB, plcC and one specific PPE-protein 
encoding gene (Rv2352). This was confirmed by the absence of a clear band on a 
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Southern blot of Asel digested genomic DNA from M microti OV254 hybridized with a 
plcA probe. However, the genes Rv2346c and Rv2347c, members of the esat-6 family, 
and Rv2348c, that are missing from M bovis and BCG strains (3) are still present in M 
microti OV254. The presence of an 1S6J10 element in this segment suggests that 
5 recombination between two IS6110 elements could have been involved in the loss of 
RD5 mic , and this is supported by the finding that the remaining copy of IS 61 10 does not 
show a 3 base-pair direct repeat in strain OV254 (Table 3). 

Lack of MiDl provides genomic clue for M. microti OV254 characteristic 

10 spoligotype. MiDl encompasses the three ORFs Rv2816, Rv2817 and Rv2818 that 
encode putative proteins whose functions are yet unknown, and has occurred in the 
direct repeat region (DR), a polymorphic locus in the genomes of the tubercle bacilli that 
contains a cluster of direct repeats of 36 bp, separated by unique spacer sequences of 36 
to 41 bp (17), (figure 7). The presence or absence of 43 unique spacer sequences that 

15 intercalate the DR sequences is the basis of spacer-oligo typing, a powerful typing 
method for strains from the M. tuberculosis complex (23). M microti isolates exhibit a 
characteristic spoligotype with an unusually small DR cluster, due to the presence of 
only spacers 37 and 38 (43). In M microti OV254, the absence of spacers 1 to 36, which 
are present in many other A£ tuberculosis complex strains, appears to result from an* 

20 IS6110 mediated deletion of 636 bp of the DR region. Amplification and PvuTL 
restriction analysis of a 2.8 kb fragment obtained with primers located in the genes that 
flank the DR region (Rv2813c and Rv2819) showed that there is only one copy of 
1S6110 remaining in this region (figure 7). This IS6110 element is inserted into ORF 
Rv2819 at position 3,119,932 relative to the M tuberculosis H37Rv genome. As for 

25 other 1S6110 elements that result from homologous recombination between two copies 
(7), no 3 base-pair direct repeat was found for this copy of IS6110 in the DR region. 
Concerning the absence of spacers 39-43 (figure7), it was found that M. microti showed 
a slightly different organization of this locus than M. bovis strains, which also 
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characteristically lack spacers 39-43. In M. microti OV254 an extra spacer of 36 bp was 
found that was not present in M bovis nor in M. tuberculosis H37Rv. The sequence of 
this specific spacer was identical to that of spacer 58 reported by van Embden and 
colleagues (42). In their study of the DR region in many strains from the M. tuberculosis 
5 complex this spacer was only found in M. microti strain NLA0000 16240 (AF189828) 
and in some ancestral M. tuberculosis strains (3, 42). Like MiDl, MiD2 most probably 
results from an IS67/0-mediated deletion of two genes (Rv3188, Rv3189) that encode 
putative proteins whose function is unknown (Table 3 above and Table 4 below). 

1 (TABLE 4. Presence of the RD and MiD regions in different M. microti strains 



HOST 


VOLES 








HUMAN 










Strain 


OV254 


OV183 


OV216 


ATCC 


Myc 94 B3 


B4 type 


Bl 


B2 










35782 


-2272 


type mouse 


mouse 


type llama type llama 


RDl™ 0 


absent 


absent 


absent 


absent 


absent 


absent 


absent 


absent 


absent 


RD 3 


absent 


absent 


absent 


absent 


absent 


absent 


absent 


absent 


absent 


RD 7 


absent 


absent 


absent 


absent 


absent 


absent 


absent 


absent 


absent 


RD8 


absent 


absent 


absent 


absent 


absent 


absent 


absent 


absent 


absent 


RD9 


absent 


absent 


absent 


absent 


absent 


absent 


absent 


absent 


absent 


RD10 


absent 


absent 


absent 


absent 


absent 


absent 


absent 


absent 


absent 


MiD3 


absent 


ND 


ND 


absent 


absent 


absent 


absent 


absent 


absent 


MiDl 


absent 


ND 
* 


ND 


present 


partial 


partial 


partial 


present 


present 


RDS** 


absent 


absent 


absent 


present 


present 


present 


present 


present 


present 


MiD2 


absent 


ND 


ND 


present 


present 


present 


present 


present 


present 



ND, not determined 



WO 03/085098 




PCT/TO03/01789 



45 



Absence of some members of the PPE famUy in M. microti. MiD3 was identified by 
the absence of two Hindm sites in BAC Mi4B9 that exist at positions 3749 kb and 3754 
kb in the M. tuberculosis H37Rv chromosome. By PCR and sequence analysis, it was 
determined that MiD3 corresponds to a 12 kb deletion that has truncated or removed five 

5 genes orthologous to Rv3345c-Rv3349c. Rv3347c encodes a protein of 3157 amino- 
acids that belongs to the PPE family and Rv3346c a conserved protein that is also 
present in M leprae. The function of both these putative proteins is unknown while 
Rv3348 and Rv3349 are part of an insertion element (Table 2). At present, the 
consequences of the MiD3 deletions for the biology of M. microti remains entirely 

10 unknown. 

Extra-DNA in M. microti OV254 relative to M. tuberculosis H37Rv. M. microti 
OV254 possesses the 6 regions RvDl to RvD5 and TBD1 that are absent from the 
sequenced strain M. tuberculosis H37Rv, but which have been shown to be present in 

15 other members of the M. tuberculosis complex, like M. canettii, M. qfricanum, M. bovis, 
and M. bovis BCG (3, 7, 13). In M. tuberculosis H37Rv, four of these regions (RvD2-5) 
contain a copy of IS6770 which is not flanked by a direct repeat, suggesting that 
recombination of two IS6110 elements was involved in the deletion of the intervening 
genomic regions (7). In consequence, it seems plausible that these regions were deleted 

20 from the M. tuberculosis H37Rv genome rather than specifically acquired by M. microti. 
Li addition, three other small insertions have also been found and they are due to the 
presence of an IS6110 element in a different location than in M. tuberculosis H37Rv and 
M. bovis AF2122/97. Indeed, PvwII RFLP analysis of M. microti OV254 reveals 13 
IS6110 elements (data not shown). 



25 



Genomic diversity of M. microti strains. In order to obtain a more global picture of the 
genetic organization of the taxon M microti we evaluated the presence or absence of the 
variable regions found in strain OV254 in eight other M. microti strains. These strains 
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which were isolated from humans and voles have been designated as M microti mainly 
on the basis of their specific spoligotype (26, 32, 43) and can be further divided into 
subgroups according to the host such as voles, llama and humans (Table 3). As stated in 
the introduction, M microti is rarely found in humans unlike M tuberculosis. So the 

5 availability of 9 strains from variable sources for genetic characterization is an 
exceptional resource. Among them was one strain (Myc 94-2272) from a severely 
immuno-compromised individual (43), and four strains were isolated from HIV-positive 
or HTV-negative humans with spoligotypes typical of llama and mouse isolates. For one 
strain, ATCC 35872 / M.P. Prague, we could not identify with certainty the original host 

10 from which the strain was isolated, nor if this strain corresponds to M microti OV166, 
that was received by Dr. Sula from Dr. Wells and used thereafter for the vaccination 
program in Prague in the 1960's (38). 

First, we were interested if these nine strains designated as M microti on the basis of 
15 their spoligotypes also resembled each other by other molecular typing criteria. As RFLP 
of pulsed-field gel separated chromosomal DNA represents probably the most accurate 
molecular typing strategy for bacterial isolates, we determined the Asel profiles of the 
available M microti strains, and found that the profiles resembled each other closely but 
differed significantly from the macro-restriction patterns of M tuberculosis, M. bovis 
20 and M bovis BCG strains used as controls. However, as depicted in Figure 8A, the 
patterns were not identical to each other and each M microti strain showed subtle 
differences, suggesting that they were not epidemiologically related. A similar 
observation was made with other rare cutting restriction enzymes, like Dral or Xbal (data 
not shown). 

25 

Common and diverging features of AC microti strains. Two strategies were used to 
test for the presence or absence of variable regions in these strains for which we do not 
have ordered BAC libraries. First, PCRs using internal and flanking primers of the 



WO 03/085098 




PCT/IB03/01789 



47 

variable regions were employed and amplification products of the junction regions were 
sequenced. Second, probes from the internal portion of variable regions absent from M 
microti OV254 were obtained by amplification of M tuberculosis H37Rv DNA using 
specific primers. Hybridization with these radio-labeled probes was carried out on blots 
5 from PFGE separated Asel restriction digests of the M microti strains. In addition, we 
confirmed the findings obtained by these two techniques by using a focused macro-array, 
containing some of the genes identified in variable regions of the tubercle bacilli to date 
(data not shown). 

10 This led to the finding that the RDl™ 0 deletion is specific for all M. microti strains 
tested. 

Indeed, none of the M. microti DNA-digests hybridized with the radio-labeled esat-6 
probe (Fig. 8B) but with the RDl mic flanking region (Fig. 8C). In addition, PCR 
amplification using primers flanking the RDl mic region (Table 2) yielded fragments of 

15 the same size for M microti strains whereas no products were obtained for M 
tuberculosis, M bovis and M bovis BCG strains (Fig. 9). Furthermore, the sequence of 
the junction region was found identical among the strains which confirms that the 
genomic organization of the RDl mic locus was the same in all tested M. microti strains 
(Table 3). This clearly demonstrates that M microti lacks the conserved ESAT-6 family 

20 core region stretching in other members of the M. tuberculosis complex from Rv3864 to 
Rv3876 and, as such, represents a taxon of naturally occurring ESAT-6 / CFP-10 
deletion mutants. 

Like RDl mic , MiD3 was found to be absent from all nine M microti strains tested and, 
25 therefore, appears to be a specific genetic marker that is restricted to M microti strains 
(Table 3). However, PCR amplification showed that RDS™ 0 is absent only from the vole 
isolates OV254, OV216 and OV183, but present in the M microti strains isolated from 
human and other origins (Table 3). This was confirmed by the presence of single bands 
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but of differing sizes on a Southern blot hybridized with a plcA probe for all M microti 
tested strains except OV254 (Fig. 8D). Interestingly, the presence or absence of RD5 m,c 
correlated with the similarity of 1S6110 RFLP profiles. The profiles of the three M 
microti strains isolated from voles in the UK differed considerably from the IS6110 

5 RFLP patterns of humans isolates (43). Taken together, these results underline the 
proposed involvement of IS6110 mediated deletion of the RD5 region and further 
suggest that RD5 may be involved in the variable potential of M. microti strains to cause 
disease in humans. Similarly, it was found that MiDl was missing only from the vole 
isolates OV254, OV216 and OV183, which display the same spoligotype (43), 

10 confirming the observations that MiDl confers the particular spoligotype of a group of 
M microti strains isolated from voles. In contrast, PCR analysis revealed that MiDl is 
only partially deleted from strains B3 and B4 both characterized by the mouse 
spoligotype and the human isolate M microti Myc 94-2272 (Table 3). For strain ATCC 
35782 deletion of the MiDl region was not observed. These findings correlate with the 

15 described spoligotypes of the different isolates, as strains that had intact or partially 
deleted MiDl regions had more spacers present than the vole isolates that only showed 
spacers 37 and 38. 

2.3 COMMENTS AND DISCUSSION 

20 

We have searched for major genomic variations, due to insertion-deletion events, 
between the vole pathogen, M microti, and the human pathogen, M tuberculosis. BAC 
based comparative genomics led to the identification of 10 regions absent from the 
genome of the vole bacillus M microti OV254 and . several insertions due to 1S6110. 
25 Seven of these deletion regions were also absent from eight other M microti strains, 
isolated from voles or humans, and they account for more than 60 kb of genomic DNA. 



WO 03/085098 




PCT/IB03/01789 



49 

Of these regions, RDl mic is of particular interest, because absence of part of this region 
has been found to be restricted to the BCG vaccine strains to date. As M. microti was 
originally described as non pathogenic for humans, it is proposed here that RD1 genes is 
involved in the pathogenicity for humans. This is reinforced by the fact that RDl 1 * 8 (29) 
5 has lost putative ORFs belonging to the esat-6 gene cluster including the genes encoding 
. ESAT-6 and CFP-10 (Fig. 6) (40). Both polypeptides have been shown to act as potent 
stimulators of the immune system and are antigens recognized during the early stages of 
infection (8, 12, 20, 34). Moreover, the biological importance of this RD1 region for 
mycobacteria is underlined by the fact that it is also conserved in M leprae, where genes 
10 ML0047-ML0056 show high similarities in their sequence and operon organization to 
the genes in the esat-6 core region of the tubercle bacilli (1 1). In spite of the radical gene 
decay observed in M leprae the esat-6 operon apparently has kept its functionality in 
this organism. 

15 However, the RD1. deletion may not be the only reason why the vole bacillus is 
attenuated for humans. Indeed, it remains unclear why certain M microti strains included 
in the present study that show exactly the same RDl mic deletion as vole isolates, have 
been found as causative agents of human tuberculosis. As human M microti cases are 
extremely rare, the most plausible explanation for this phenomenon would be that the 

20 infected people were particularly susceptible for mycobacterial infections in general. 
This could have been due to an immunodeficiency (32, 43) or to a rare genetic host 
predisposition such as interferon gamma- or IL-12 receptor modification (22). 

In addition, the finding that human M microti isolates differed from vole isolates by the 
25 presence of region RD5 mic may also have an impact on the increased potential of human 
M microti isolates to cause disease. Intriguingly, BCG and the vole bacillus lack 
overlapping portions of this chromosomal region that encompasses three (plcA, plcB, 
plcC) of the four genes encoding phospholipase C (PLC) in M tuberculosis. PLC has 
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been recognized as an important virulence factor in numerous bacteria, including 
Clostridium perfringens, Listeria monocytogenes and Pseudomonas aeruginosa, where 
it plays a role in cell to cell spread of bacteria, intracellular survival, and cytolysis (36, 
41). To date, the exact role of PLC for the tubercle bacilli remains unclear. plcA encodes 
5 the antigen mtp40 which has previously been shown to be absent from seven tested vole 
and hyrax isolates (28). Phospholipase C activity in M tuberculosis, M microti and M 
bovis, but not in M bovis BCG, has been reported (21, 47). However, PLC and 
sphingomyelinase activities have been found associated with the most virulent 
mycobacterial species (21). The levels of phospholipase C activity detected in M bovis 

10 were much lower than those seen in M tuberculosis consistent with the loss of pic ABC. 
It is likely, that plcD is responsible for the residual phospholipase C activity in strains 
lacking RD5, such as M. bovis and M microti OV254. Indeed, the plcD gene is located 
in region RvD2 which is present in some but not all tubercle bacilli (13, 18). 
Phospholipase encoding genes have been recognized as hotspots for integration of 

15 IS6110 and it appears that the regions RD5 and RvD2 undergo independent deletion 
processes more frequently than any other genomic regions (44). Thus, the virulence of 
some M. microti strains may be due to a combination of functional phospholipase C 
encoding genes (7, 25, 26, 29). 

20 Another intriguing detail revealed by this study is that among the deleted genes seven 
code for members of the PPE family of Gly-, Ala-, Asn-rich proteins. A closer look at 
the sequences of these genes showed that in some cases they were small proteins with 
unique sequences, like for example Rv3873, located in the RDl mic region, or Rv2352c 
and Rv2353c located in the RD5 mic region. Others, like Rv3347c, located in the MiD3 

25 region code for a much larger PPE protein (3157 aa). In this case a neighboring gene 
(Rv3345c), belonging to another multigene family, the PE-PGRS family, was partly 
affected by the MiD3 deletion. While the function of the PE/PPE proteins is currently 
unknown, their predicted abundance in the proteome of M tuberculosis suggests that 
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they may play an important role in the life cycle of the tubercle bacilli. Indeed, recently 
some of them were shown to be involved in the pathogenicity of M. tuberculosis strains 
(9). Complementation of such genomic regions in M microti OV254 should enable us to 
carry out proteomics and virulence studies in animals in order to understand the role of 
5 such ORFs in pathogenesis. 

In conclusion, this study has shown that M microti, a taxon originally named after its 
major host Mcrotus agrestis, the common vole, represents a relatively homogenous 
group of tubercle bacilli. Although all tested strains showed unique PFGE macro- 

10 restriction patterns that differed slightly among each other, deletions that were common 
to all M. microti isolates (RD7-RD10, MiD3, RDl mic ) have been identified. The 
conserved nature of these deletions suggests that these strains are derived from a 
common precursor that has lost these regions, and their loss may account for some of the 
observed common phenotypic properties of M microti, like the very slow growth on 

15 solid media and the formation of tiny colonies. This finding is consistent with results 
from a recent study that showed that M microti strains carry a particular mutation in the 
gyr£gene(31). 

Of particular interest, some of these common features (e.g. the flanking regions of 
20 RDl mic , or MiD3) could be exploited for an easy-to-perform PCR identification test, 
similar to the one proposed for a range of tubercle bacilli (33). This test enables 
unambiguous and rapid identification of M. microti isolates in order to obtain a better 
estimate of the overall rate of M microti infections in humans and other mammalian 
species. 

25 



Example 3: Recombinant BCG exporting ESAT-6 confers enhanced protection 
against tuberculosis 



WO 03/085098 




PCT/IB03/01789 



52 

3.1 Complementation of the RD1 locus of BCG Pasteur and M. microti 

To construct a recombinant vaccine that secretes both ESAT-6 and CFP-10, we 
5 complemented BCG Pasteur for the RD1 region using genomic fragments spanning 
variable sections of the esxBA (or ESAT-6) locus from M tuberculosis (Fig. 10). The 
RD1 deletion in BCG interrupts or removes nine CDS and affects all four transcriptional 
units: three are removed entirely while the fourth (Rv3867-Rv3871) is largely intact 
apart from the loss of 112 codons from the 3'-end of Rv3871 (Fig. 10). Transcriptome 
10 analysis of BCG, performed using cDNA probes obtained from early log phase cultures 
with oligonucleotide-based microarrays, was able to detect signals at least two fold 
greater than background for the probes corresponding to Rv3867 to 3871 inclusive, but 
not for the RDl-deleted genes Rv3872 to Rv3879. This suggests that the Rv3867-3871 
transcriptional unit is still active in BCG which, like M bovis, also has frameshifts in the 
15 neighbouring gene, Rv3881 (Fig. 10). The RDl mic deletion of M microti removes three 
transcriptional units completely with only gene Rv3877 remaining from the fourth. The 
M tuberculosis clinical isolate MT56 has lost genes Rv3878-Rv3879 (Brosch, R„ et al 
A new evolutionary scenario for the Mycobacterium tuberculosis complex. Proc Natl 
AcadSci USA 99, 3684-9. (2002)) but still secretes ESAT-6 and CFP-10 (Fig. 10). 

20 

To test the hypothesis that a dedicated export machinery exists and to establish which 
genes were essential for creating an ESAT-6-CFP-10 secreting vaccine we assembled a 
series of integrating vectors carrying fragments spanning different portions of the RD1 
esx gene cluster (Fig. 10). These integrating vectors stably insert into the attB site of the 
25 genome of tubercle bacilli. pAP34 was designed to carry only the antigenic core region 
encoding ESAT-6 and CFP-10, and the upstream PE and PPE genes, whereas RD1-I106 
and RDl-pAP35 were selected to include the core region and either the downstream or 
upstream portion of the gene cluster, respectively. The fourth construct RD1-2F9 
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contains a - 32 kb segment from M tuberculosis that stretches from Rv3861 to Rv3885 
covering the entire RD1 gene cluster. We adopted this strategy of complementation with 
large genomic fragments to avoid polar effects that might be expected if a putative 
protein complex is only partially complemented in trans. In addition, a set of smaller 
5 expression constructs (pAP47, pAP48) was established in which individual genes are 
transcribed from a heat shock promoter (Fig. 10). Using appropriate antibodies all of 
these constructs were found to produce the corresponding proteins after transformation 
of BCG or M. microti (see below). 

10 3.2 Several genes of the esx cluster are required for export of ESAT-6 and CFP-10 

The four BCG::RD1 recombinants (BCG::RDl-pAP34, BCG::RDl-pAP35, BCG::RD1- 
2F9 and BCG::RD1-I106) (Fig. 1 1) were initially tested to ensure that ESAT-6 and CFP- 
10 were being appropriately expressed from the respective integrated constructs. 

15 hnmunoblotting of whole cell protein extracts from mid-log phase cultures of the 
various BCG::RD1 recombinants using an ESAT-6 monoclonal antibody or polyclonal 
sera for CFP-10 and the PPE68 protein Rv3873 demonstrated that all three proteins were 
expressed from the four constructs at levels comparable to those of M tuberculosis (Fig. 
1 1). However, striking differences were seen when the supernatants from early log-phase 

20 cultures of each recombinant were screened by Western blot for the two antigens. 
Although low levels of ESAT-6 and CFP-10 could be detected in the concentrated 
supernatant protein fractions of BCG::RDl-pAP34, BCG::RDl-pAP35 and BCG::RD1- 
1106 it was only with the integrated construct encompassing the entire esx gene cluster 
(BCG::RD1-2F9) that the two antigens accumulated in significant amounts. The high 

25 concentrations of ESAT-6 and CFP-10 seen in the supernatant of the recombinant 
BCG::RD1-2F9 were not due to a non-specific increase in permeability, or loss of cell 
wall material, because when the same whole cell and supernatant protein fractions were 
immunoblotted with serum raised against Rv3873, this protein was only localized in the 
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cell wall of the various recombinants. As expected, when constructs were used 
containing esxA or esxBA alone, ESAT-6 did not accumulate in the culture supernatant 
(data not shown). 
■ 

5 To assess the effect of the RDl mic deletion of M microti on the export of ESAT-6 and 
CFP-10 and subsequent antigen handling, the experiments were replicated in this 
genomic background. As with BCG, ESAT-6 and CFP-10 were only exported into the 
supernatant fraction in significant amounts if expressed in conjunction with the entire 
esx cluster (Fig. 11). The combined findings demonstrate that complementation with 

10 esxA or esxB alone is insufficient to produce a recombinant vaccine that secretes these 
two antigens. Rather, secretion requires expression of genes located both upstream and 
downstream of the antigenic core region confirming our hypothesis 2 ^ that the conserved 
esx gene cluster does indeed encode functions essential for the export of ESAT-6 and 
CFP-10. 

15 

3.3 Secretion of ESAT-6 is needed to induce antigen specific T-cell responses 

Since the classical observation that inoculation with live, but not dead BCG, confers 
protection against tuberculosis in animal models it has been considered that secretion of 
antigens is critical for maximizing protective T-cell immunity. Using our panel of 

20 recombinant vaccines we were able to test if antigen secretion was indeed essential for 
eliciting ESAT-6 specific T-cell responses. Groups of C57/BL6 mice were inoculated 
subcutaneously with one of six recombinant vaccines (BCG-pAP47, BCG-pAP48, 
BCG::RDl-pAP34, BCG::RDl-pAP35, BCG: :RD 1-1106, BCG::RD1-2F9) or with BCG 
transformed with the empty vector pYUB412. Three weeks following vaccination, T-cell 

25 immune responses to the seven vaccines were assessed by comparing antigen-specific 
splenocyte proliferation and gamma interferon (BFN-y) production (Fig. 12A). As 
anticipated all of the vaccines generated splenocyte proliferation and BFN-y production in 
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response to PPD (partially purified protein derivative) but not against an unrelated MalE 
control peptide indicating successful vaccination in each case. However, only 
splenocytes from the mice inoculated with BCG::RD1-2F9 proliferated markedly in 
response to the immunodominant ESAT-6 peptide (Fig. 12 A). Furthermore, EFN-D was 
5 only detected in culture supernatants of splenocytes from mice immunized with 
BCG::RD1-2F9 following incubation with the ESAT-6 peptide (Fig 12B) or 
recombinant CFP-10 protein (data not shown). These data demonstrate that export of the 
antigens is essential for stimulating specific Thl -oriented T-cells. 

Further characterization of the immune responses was carried out. Splenocytes from 
10 mice immunized with BCG::RD1-2F9 or control BCG both proliferated in response to 
the immunodominant antigen 85A peptide (Fig 13A). The strong splenocyte 
proliferation in the presence of ESAT-6 was abolished by an anti-CD4 monoclonal 
antibody but not by anti-CD8 indicating that the CD4* T-cell subset was involved (Fig. 
13B). Interestingly, as judged by in vitro IFN-y response to PPD and the ESAT peptide, 
15 subcutaneous immunization generated much stronger T-cell responses (Fig. 13C) 
compared to intravenous injection. After subcutaneous immunisation with BCG::RD1- 
2F9 strong ESAT-6 specific responses were also detected in inguinal lymph nodes (data 
not shown). These experiments demonstrated that the ESAT-6 T-cell immune responses 
to vaccination with BCG::RD1-2F9 were potent, reproducible and robust making this 
20 recombinant an excellent candidate for protection studies. 

3.4 Protective efficacy of BCG::RD1-2F9 in immu no-competent mice 

When used alone as a subunit or DNA vaccine, ESAT-6 induces levels of protection 
weaker than but akin to those of BCG (Brandt, L„ Elhay, M., Rosenkrands, L, Lindblad, E3. & 
Andersen, P. ESAT-6 subunit vaccination against Mycobacterium tuberculosis. Infect Jmmuru 68, 791- 
25 795 (2000)). Thus, it was of interest to determine if the presentation to the immune system 
of ESAT-6 and/or CFP-10 in the context of recombinant BCG, mimicking the 
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presentation of the antigens during natural infection, could increase the protective 
efficiency of BCG. The BCG::RD1-2F9 recombinant was therefore selected for testing 
as a vaccine, since it was the only ESAT-6 exporting BCG that elicited vigorous antigen 
specific T-cell immune responses. Groups of C57BL/6 mice were inoculated 
5 intravenously with either BCG::RD1-2F9 or BCG::pYUB412 and challenged 
intravenously after eight weeks with M tuberculosis H37Rv. Growth of M. tuberculosis 
H37Rv in spleens and lungs of each vaccinated cohort was compared with that of 
unvaccinated controls two months after infection (Fig. 14A). This demonstrated that, 
compared to vaccination with BCG, the BCG::RD1-2F9 vaccine inhibited growth of M. 
10 tuberculosis H37Rv in the spleens by 0.4 loglO CFU and was of comparable efficacy at 
protecting the lungs. 

To investigate this enhanced protective effect against tuberculosis further we repeated 
the challenge experiment using the aerosol route. In this experiment antibiotic treatment 
was employed to clear persisting BCG from mouse organs prior to infection with M 

15 tuberculosis. Two months following vaccination C57BL/6 mice were treated with daily 
rifampicin/izoniazid for three weeks and then infected with 1000 CFU of M. 
tuberculosis H37Rv by the respiratory route. Mice were then sacrificed after 17, 35 and 
63 days and bacterial enumeration carried out on the lungs and spleen. This 
demonstrated that, even following respiratory infection, vaccination with BCG::RD1- 

20 2F9 was superior to vaccination with the control strain of BCG (Fig. 14B). However, 
growth of M tuberculosis was again only inhibited strongly in the mouse spleens. 



Example 4: Protective efficacy of BCG::RD1-2F9 in guinea pigs 



25 



4.1 Animal models M tuberculosis H37Rv and the different recombinant vaccines 
were prepared in the same manner as for the immunological assays. For the guinea pig 
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assays, groups of outbred female Dunkin-Hartley guinea pigs (David Hall, UK) were 
inoculated with 5 x 10 4 CFUs by the subcutaneous route. Aerosol challenge was 
performed 8 weeks after vaccination using a contained Henderson apparatus and an 
H37Rv (NCTC 7416) suspension in order to obtain an estimated retained inhaled dose of 
5 approximately 1000 CFU/lung (Williams, A., Davies, A, Marsh, PIX, Chambers, M.A. & 
Hewinson, R.G. Comparison of the protective efficacy of bacille calmette-Guerin vaccination against 
aerosol challenge with Mycobacterium tuberculosis and Mycobacterium bovis. Clin Infect Dis 30 SuppI 3, 
S299-301. (2000)). Organs were homogenized and dilutions plated out on 7H1 1 agar, as for 
the mice experiments. Guinea pig experiments were carried out in the framework of the 
10 European Union TB vaccine development program. 



4.2 Results Although experiments in mice convincingly demonstrated a superior 
protective efficacy of BCG::RD1 over BCG it was important to establish a similar effect 
in the guinea pig model of tuberculosis. Guinea pigs are exquisitely sensitive to 

15 tuberculosis, succumbing rapidly to low dose infection with M tuberculosis, and 
develop a necrotic granulomatous pathology closer to that of human tuberculosis. 
Immunization of guinea pigs with BCG::RD1-2F9 was therefore compared to 
conventional BCG vaccination. Groups of six guinea pigs were inoculated 
subcutaneously with saline, BCG or BCG::RD1-2F9. Eight weeks following inoculation 

20 the three guinea pig cohorts were challenged with M tuberculosis H37Rv via the aerosol 
route. Individual animals were weighed weekly and were killed 17 weeks after challenge 
or earlier if they developed signs of severe tuberculosis. Whereas all unvaccinated 
guinea pigs failed to thrive and were euthanised before the last time-point because of 
overwhelming disease, both the BCG- and recombinant BCG::RDl-2F9-vaccinated 

25 animals progressively gained weight and were clinically well when killed on termination 
of the experiment (Fig. 15A). This indicated that although the BCG::RD1-2F9 
recombinant is more virulent in severely immunodeficient mice (Pym, A.S., Brodin, P., 
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Brosch, R., Huerre, M. & Cole, S.T. Loss of RD1 contributed to the attenuation of the 
live tuberculosis vaccines Mycobacterium bovis BCG and Mycobacterium microti. Mol. 
Microbiol 46, 709-717 (2002)). there is no increased pathogenesis in the highly 
susceptible guinea pig model of tuberculosis. Moreover, when the bacterial loads in the 

5 spleens of the vaccinated animals were compared there was a greater than ten-fold 
reduction in the number of CFU recovered from the animals immunised with 
BCG::RD1-2F9 when compared to BCG (Fig. 15B). Interestingly, there was no 
significant difference between the number of CFU obtained from the lungs of the two 
vaccinated groups indicating that the organ-specific enhanced protection observed in 

10 mice vaccinated with BCG::RD1-2F9 was also seen with guinea pigs. This marked 
reduction of bacterial loads in the spleens of BCG::RD1-2F9 immunised animals was 
also reflected in the gross pathology. Visual examination of the spleens showed that 
tubercules were much larger and more numerous on the surface of the BCG-vaccinated 
guinea pigs (Fig. 15C). These results demonstrate that the recombinant vaccine 

15 BCG::RD1-2F9 conveys enhanced protection to an aerosol challenge with M. 
tuberculosis in two distinct animal models. 



GENERAL CONCLUSION 

20 Tuberculosis is still one of the leading infectious causes of death in the world despite a 
decade of improving delivery of treatment and control strategies (Dye, C, Scheele, S., 
Dolin, P., Pathania, V. & Raviglione, M.C. Consensus statement. Global burden of 
tuberculosis: estimated incidence, prevalence, and mortality by country. WHO Global 
Surveillance and Monitoring Project Jama 282, 677-86. (1999)). Reasons for the 

25 recalcitrance of this pandemic are multi-factorial but include the modest efficacy of the 
widely used vaccine, BCG. Two broad approaches can be distinguished for the 
development of improved tuberculosis vaccines (Baldwin, S.L., et ah Evaluation of new 
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vaccines in the mouse and guinea pig model of tuberculosis. Infection & Immunity 66, 
2951-9 (1998), Kaufmann, SJHL How can immunology contribute to the control of 
tuberculosis ONature Rev Immunol 1, 20-30. (2001) and Young, D.B. & Fruth, U. in New 
Generation Vaccines (eds. Levine, M., Woodrow, G., Kaper, J. & Cobon GS) 631-645 
5 (Marcel Dekker, 1997)). These are the development of subunit vaccines based on 
purified protein antigens or new live vaccines that stimulate a broader range of immune 
responses. Although a growing list of individual or combination subunit vaccines, and 
hybrid proteins, have been tested none has yet proved superior to BCG in animal models 
(Baldwin, S.L., et al t 1998). Similarly, new attenuated vaccines derived from virulent 

10 M tuberculosis have yet to out-perform BCG (Jackson, M., et al Persistence and 
protective efficacy of a Mycobacterium tuberculosis auxotroph vaccine. Infect Immun 
67, 2867-73. (1999) and Hondalus, M.K., et at Attenuation of and protection induced by 
a leucine auxotroph of Mycobacterium tuberculosis. Infect Immun 68, 2888-98. (2000)). 
Interestingly, the only vaccine that appears to surpass BCG is a BCG recombinant over 

15 expressing antigen 85A (Horwitz, M.A., Harth, G., Dillon, B.J. & Maslesa-Galic, S. 
Recombinant bacillus calmette-guerin (BCG) vaccines expressing the Mycobacterium 
tuberculosis 30-kDa major secretory protein induce greater protective immunity against 
tuberculosis than conventional BCG vaccines in a highly susceptible animal model. Proc 
Natl Acad Sci USA 97, 13853-8. (2000)). The basis for this vaccine was the notion that 

20 over-expression of an immunodominant T-cell antigen could quantitatively enhance the 
BCG-elicited immune response. 

In frame with the invention, we were able to show that restoration of the RD1 locus did 
indeed improve the protective efficacy of BCG and defines a genetic modification that 
25 should be included in new recombinant BCG vaccines. Moreover, we were able to 
demonstrate two further findings that will be crucial for the development of a live 
vaccine against tuberculosis. First, we have identified the genetic basis of secretion for 
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the ESAT-6 family of immunodominant T-cell antigens, and second, we show that 
export of these antigens from the cytosol is essential for maximizing their antigenicity. 

The extra-cellular proteins of M. tuberculosis have been extensively studied and shown 
5 to be a rich source of protective antigens (Sorensen, A.L., Nagai, S., Houen, G., 
Andersen, P. & Andersen, A.B. Purification and characterization of a low-molecular- 
mass T-cell antigen secreted by Mycobacterium tuberculosis. Infect Immun 63, 1710-7 
(1995), Skj0t, R.L.V., et ah Comparative evaluation of low-molecular-mass proteins 
from Mycobacterium tuberculosis identifies members of the ESAT-6 family as 

10 immunodominant T-cell antigens. Infect Immun. 68, 214-220 (2000), Horwitz, M.A., 
Lee, B.W., Dillon, BJ. & Harth, G. Protective immunity against tuberculosis induced by 
vaccination with major extracellular proteins of Mycobacterium tuberculosis. Proc Natl 
AcadSci USA 92, 1530-4 (1995) and Boesen, H., Jensen, B.N., Wilcke, T. & Andersen, 
P. Human T-cell responses to secreted antigen fractions of Mycobacterium tuberculosis. 

15 Infect Immun 63, 1491-7 (1995)). Despite this it remains a mystery how some of these 
proteins, that lack conventional secretion signals, are exported from the cytosol, a unique 
problem in M. tuberculosis given the impermeability and waxy nature of the 
mycobacterial cell envelope. Although two secA orthologues were identified in the 
genome sequence of M. tuberculosis (Cole, S.T., et ah Deciphering the biology of 

20 Mycobacterium tuberculosis from the complete genome sequence. Nature 393, 537-544 
(1998)), no genes for obvious type I, n, or HI protein secretion systems were detected, 
like those that mediate the virulence of many Gram-negative bacterial pathogens (Finlay, 
B.B. & Falkow, S. Common themes in microbial pathogenicity revisited. Microbiol. 
Moh Biol. Rev. 61, 136-169 (1997)). This suggested that novel secretion systems might 

25 exist. An in silico analysis of the M. tuberculosis proteome identified a set of proteins 
and genes whose inferred functions, genomic organisation and strict association with the 
esx gene family suggested that they could constitute such a system (Tekaia, F., et ah 
Analysis of the proteome of Mycobacterium tuberculosis in silico. Tubercle Lung 



WO 03/085098 




PCT/IB03/01789 



61 

Disease 79, 329-342 (1999)). Our results provide the first empirical evidence that this 
gene cluster is essential for the normal export of ESAT-6 and CFP-10. 

The antigen genes, esxBA, lie at the centre of the conserved gene cluster. Bioinformatics 
5 and comparative genomics predicted that both the conserved upstream genes Rv3868- 
Rv3871, as well as the downstream genes Rv3876-Rv3877, would be required for 
secretion (Fig. 1) and strong experimental support for this prediction is provided here. 
Our experiments show that only when BCG or M. microti are complemented with the 
entire cluster is maximal export of ESAT-6 and CFP-10 obtained. This suggests that at 

10 least Rv3871 and either Rv3876 or Rv3877 are indeed essential for the normal secretion 
of ESAT-6 as these are the only conserved genes absent or disrupted in BCG which are 
not complemented by RD1-I106 or RDl-pAP35. These genes encode a large 
transmembrane protein with ATPase activity, an ATP-dependent chaperone and an 
integral membrane protein, functional predictions compatible with them being part of a 

15 multi-protein complex involved in the translocation of polypeptides. Amongst the 
proteins encoded by the esx cluster Rv3871 and Rv3877 are highly conserved, as 
orthologues have been identified in the more streamlined clusters found in other 
actinomycetes, further supporting their direct role in secretion (Gey Van Pittius, N.C., et 
al. The ESAT-6 gene cluster of Mycobacterium tuberculosis and other high G+C Gram- 

20 positive bacteria. Genome Biol 2, 44.1-44.18 (2001)). It has been shown recently that 
ESAT-6 and CFP-10 form a heterodimer in vitro (Renshaw, P.S., et al Conclusive 
evidence that the major T-cell antigens of the M tuberculosis complex ESAT-6 and 
CFP-10 form a tight, 1:1 complex and characterisation of the structural properties of 
ESAT-6, CFP-10 and the ESAT-6-CFP-10 complex: implications for pathogenesis and 

25 virulence. J Biol Chem 8, 8 (2002)) but it is not known whether dimerisation precedes 
translocation across the cell membrane or occurs at a later stage in vivo. In either case, 
chaperone or protein clamp activity is likely to be required to assist dimer formation or 
to prevent premature complexes arising as is well documented for type HI secretion 
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systems (Page, A.L. & Parsot, C. Chaperones of the type HI secretion pathway: jacks of 
all trades. Mol Microbiol 46, 1-11. (2002)). These, and other questions concerning the 
precise roles of the individual components of the ESAT-6 secretory apparatus, can now 
be addressed experimentally using the tools developed here. 

5 

The second major finding of the invention is that the secretion of ESAT-6 (and probably 
CFP-10) is critical for inducing maximal T-cell responses although other RD1 -encoded 
proteins may also contribute such as the PPE68 protein (Rv3873) which is located in the 
cell envelope. We show that even though whole cell expression levels of ESAT-6 are 

10 comparable amongst our vaccines (Fig. 2), only the vaccine strain exporting ESAT-6, via 
an intact secretory apparatus, elicits powerful T-cell responses. Surprisingly, even the 
recombinants RDl-pAP47 and RDl-pAP48, that overexpress ESAT-6 intracellularly, 
did not generate detectable ESAT-6 specific T-cell responses. Although antigen 
secretion has long been recognized as. important for inducing immunity against M 

15 tuberculosis, and is often used to explain why killed BCG offers no protection, this is 
one of the first formal demonstrations of its importance. BCG, like M tuberculosis 
resides in the phagosome, where secreted antigens have ready access to the MHC class II 
antigen processing pathway, essential for inducing IFN-y producing CD4 T-cells 
considered critical for protection against tuberculosis. Further understanding of the 

20 mechanism of ESAT-6 secretion could allow the development of BCG recombinants that 
deliver other antigens in the same way. 

The main aim of the present invention was to qualitatively enhance the antigenicity of 
BCG. So, having assembled a recombinant vaccine that secreted the T-cell antigens 
25 ESAT-6 and CFP-10, and shown that it elicited powerful CD4 T-cell immunity against 
at least ESAT-6 and CFP-10, the next step was to rigorously test its efficacy in animal 
models of tuberculosis. In three distinct models, including two involving respiratory 
challenge, we were able to demonstrate that the ESAT-6-CFP-10 secreting recombinant 
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improved protection when compared to a BCG control, although this effect was 
restricted to the spleen. This is probably due to the fact that the enhanced immunity 
induced by the two additional antigens is insufficient to abort the primary infection but 
does significantly reduce the dissemination of bacteria from the lung. The lack of 

5 protection afforded to the lung, the portal of entry for M tuberculosis, does not prevent 
BCG::RD1-2F9 from being a promising vaccine candidate. Primary tuberculosis occurs 
in the middle and lower lobes and is rarely symptomatic (Garay, S.M. in Tuberculosis 
(eds. Rom, W.N. & Garay, S.M.) 373-413 (Little, Brown and Company, Boston, 1996)). 
The bacteria need to reach the upper lobes, the commonest site of disease, by 

10 haematogenous spread. Therefore, a vaccine that inhibits dissemination of M 
tuberculosis from the primary site of infection would probably have major impact on the 
outcome of tuberculosis. 

Recombinant BCG vaccines have definite advantages over other vaccination strategies in 
that they are inexpensive, easy to produce and convenient to store. However, despite an 

15 unrivalled and enviable safety record concerns remain and BCG is currently not 
administered to individuals with HTV infection. As shown above, the recombinant 
BCG::RD1-2F9 grows more rapidly in Severe Combined hnmunodeficient (SCDD) mice, 
an extreme model of immunodeficiency, than its parental BCG strain. However, in both 
immunocompetent mice and guinea pigs we have not observed any increased pathology 

20 only a slight increase in persistence which may be beneficial, since the declining efficacy 
of BCG with serial passage has been attributed to an inadvertent increase in its 
attenuation (Behr, MA. & Small, P.M. Has BCG attenuated to impotenceDMrfwre 389, 
133-4. (1997)). 

Ultimately, the robust enhancement in protection we have observed with the 
25 reincorporation of the RD1 locus is a compelling reason to include this genetic 
modification in any recombinant BCG vaccine, even if this may require the need for a 
balancing attenuating mutation. 
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In summary, the data presented here show that, in addition to its increased persistence, 
BCG::RD1-2F9 induces specific T-cell memory and enhances immune responses to 
other endogenous Thl antigens such as the mycoloyl transferase, antigen 85A. 
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CLAIMS 



1. A strain of M. bovis BCG or M. microti, wherein said strain has integrated all or part 
of the fragment, named RD1-2F9, of 31808 pb of DNA originating from 
Mycobacterium tuberculosis or any virulent member of the Mycobacterium 
tuberculosis complex (M africanum, M. bovis, M. canettii), as shown in SEQ ID No 
1 and which is responsible for enhanced immunogenicity and increased persistence 
of BCG to the tubercle bacilli. 

2. A strain of M. bovis BCG or M. microti according to claim 1, wherein said strain has 
integrated all or part of the fragment of DNA originating from Mycobacterium 
tuberculosis or any virulent member of the Mycobacterium tuberculosis complex 
(M africanum, M bovis, M. canettii) as shown in SEQ ID No 2 responsible for 
enhanced immunogenicity and increased persistence of BCG to the tubercle bacilli. 

3. A strain of M bovis BCG or M. microti according to claim 1, wherein said strain has 
integrated all or part of the fragment of DNA originating from Mycobacterium 
tuberculosis or any virulent member of the Mycobacterium tuberculosis complex 
(M africanum, M. bovis, M. canettii) as shown in SEQ ID No 3 responsible for 
enhanced immunogenicity and increased persistence of BCG to the tubercle bacilli. 

4. A strain according to claim 1 which has integrated a portion of DNA originating 
from Mycobacterium tuberculosis or any virulent member of the Mycobacterium 
tuberculosis complex (M. africanum, M bovis, M. canettii), which comprises at 
least one, two, three or more gene(s) selected from Rv3861 (SEQ ID No 4), Rv3862 
(SEQ ID No 5), Rv3863 (SEQ ID No 6), Rv3864 (SEQ ID No 7), Rv3865 (SEQ ID 
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No 8), Rv3866 (SEQ ID No 9), Rv3867 (SEQ ID No 10), Rv3868 (SEQ ID No 1 1), 
Rv3869 (SEQ ID No 12), Rv3870 (SEQ ID No 13), Rv3871 (SEQ ID No 14), 
Rv3872 (SEQ ID No 15, mycobacterial PE), Rv3873 (SEQ ID No 16, PPE), Rv3874 
(SEQ ID No 17, CFP-10), Rv3875 (SEQ ID No 18, ESAT-6), Rv3876 (SEQ ID No 
19), Rv3877 (SEQ ID No 20), Rv3878 (SEQ ID No 21), Rv3879 (SEQ ID No 22), 
Rv3880 (SEQ ID No 23), Rv3881 (SEQ ID No 24), Rv3882 (SEQ ID No 25), 
Rv3883 (SEQ ID No 26), Rv3884 (SEQ ID No 27) and Rv3885 (SEQ ID No 28). 

5. A strain according to claim 1 which has integrated a portion of DNA originating 
from Mycobacterium tuberculosis or any virulent member of the Mycobacterium 
tuberculosis complex (M qfricanum, M bovis, M. canettii), which comprises at 
least one, two, three or more gene(s) selected from Rv3867 (SEQ ID No 10), 
Rv3868 (SEQ ID No 11), Rv3869 (SEQ ID No 12), Rv3870 (SEQ ID No 13), 
Rv3871 (SEQ ID No 14), Rv3872 (SEQ ID No 15, mycobacterial PE), Rv3873 
(SEQ ID No 16, PPE), Rv3874 (SEQ ID No 17, CFP-10), Rv3875 (SEQ ID No 18, 
ESAT-6), Rv3876 (SEQ ID No 19) and Rv3877 (SEQ ID No 20). 

6. A strain according to claim 1 which has integrated a portion of DNA originating 
from Mycobacterium tuberculosis or any virulent member of the Mycobacterium 
tuberculosis complex (M qfricanum, M. bovis, M. canettii), which comprises at 
least one, two, three or more gene(s) selected from Rv3872 (SEQ ID No 15, 
mycobacterial PE), Rv3873 (SEQ ID No 16, PPE), Rv3874 (SEQ ID No 17, CFP- 
10) and Rv3875 (SEQ ID No 1 8, ESAT-6). 

7. A strain according to claim 1 which has integrated a portion of DNA originating 
from Mycobacterium tuberculosis or any virulent member of the Mycobacterium 
tuberculosis complex (M qfricanum, M bovis, M. canettii), which comprises at 
least four genes selected from Rv3861 (SEQ ID No 4), Rv3862 (SEQ ID No 5), 
Rv3863 (SEQ ID No 6), Rv3864 (SEQ ID No 7), Rv3865 (SEQ ID No 8), Rv3866 
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(SEQ ID No 9), Rv3867 (SEQ ID No 10), Rv3868 (SEQ ID No 1 1), Rv3869 (SEQ 
ID No 12), Rv3870 (SEQ ID No 13), Rv3871 (SEQ ID No 14), Rv3872 (SEQ ID 
No 15, mycobacterial PE), Rv3873 (SEQ ID No 16, PPE), Rv3874 (SEQ ID No 17, 
CFP-10), Rv3875 (SEQ ID No 18, ESAT-6), Rv3876 (SEQ ED No 19), Rv3877 
5 (SEQ ID No 20), Rv3878 (SEQ ID No 21), Rv3879 (SEQ ID No 22), Rv3880 (SEQ 

ID No 23), Rv3881 (SEQ ID No 24), Rv3882 (SEQ ID No 25), Rv3883 (SEQ ID 
No 26), Rv3884 (SEQ ID No 27) and Rv3885 (SEQ ID No 28)., provided that it 
comprises Rv3874 (SEQ ID No 17, CFP-10) and/or Rv3875 (SEQ ID No 18, 
ESAT-6). 

10 8. A strain according to claim 1 which has integrated a portion of DNA originating 
from Mycobacterium tuberculosis or any virulent member of the Mycobacterium 
tuberculosis complex (M africanum, M bovis, M. canettii), which comprises at 
least Rv3871 (SEQ ID No 14), Rv3875 (SEQ ID No 18, ESAT-6) and Rv3876 
(SEQ ID No 19). 

15 9. A strain according to claim 1 which has integrated a portion of DNA originating 
from Mycobacterium tuberculosis or any virulent member of the Mycobacterium 
tuberculosis complex (M africanum, M. bovis, M. canettii), which comprises at 
least Rv3871 (SEQ ID No 14), Rv3875 (SEQ ID No 18, ESAT-6) and Rv3877 
(SEQ ID No 20). 

20 10. A strain according to claim 1 which has integrated a portion of DNA originating 
from Mycobacterium tuberculosis or any virulent member of the Mycobacterium 
tuberculosis complex (M africanum, M. bovis, M. canettii), which comprises at 
least Rv3871 (SEQ ID No 14), Rv3875 (SEQ ID No 18, ESAT-6), Rv3876 (SEQ ID 
No 19) and Rv3877 (SEQ ID No 20). 
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1 1. A strain according to one of claims 8 to 10 which has integrated a portion of DNA 
originating from Mycobacterium tuberculosis or any virulent member of the 
Mycobacterium tuberculosis complex (M qfricanum, M. bovis, M. canettii), which 
further comprises Rv3874 (SEQ ID No 17, CFP-10). 

5 12. A strain according to one of claims 8 to 1 1 which has integrated a portion of DNA 
originating from Mycobacterium tuberculosis or any virulent member of the 
Mycobacterium tuberculosis complex (M qfricanum, M. bovis, M. canettii), which 
further comprises Rv3872 (SEQ ID No 15, mycobacterial PE). 

13. A strain according to one of claims 8 to 12 which has integrated a portion of DNA 
10 originating from Mycobacterium tuberculosis or any virulent member of the 

Mycobacterium tuberculosis complex (M qfricanum, M. bovis, M. canettii), which 
further comprises Rv3873 (SEQ ID No 16, PPE). 

14. A strain according to one of claims 8 to 13 which has integrated a portion of DNA 
originating from Mycobacterium tuberculosis or any virulent member of the 

15 Mycobacterium tuberculosis complex (M qfricanum, M. bovis, M. canettii), which 

further comprises at least one, two, three or four gene(s) selected from Rv3861 
(SEQ ID No 4), Rv3862 (SEQ ID No 5), Rv3863 (SEQ ID No 6), Rv3864 (SEQ ID 
No 7), Rv3865 (SEQ ID No 8), Rv3866 (SEQ ID No 9), Rv3867 (SEQ ID No 10), 
Rv3868 (SEQ ID No 11), Rv3869 (SEQ ID No 12), Rv3870 (SEQ ID No 13), 

20 Rv3878 (SEQ ID No 21), Rv3879 (SEQ ID No 22), Rv3880 (SEQ ID No 23), 

Rv3881 (SEQ ID No 24), Rv3882 (SEQ ID No 25), Rv3883 (SEQ ID No 26), 
Rv3884 (SEQ ID No 27) and Rv3885 (SEQ ID No 28). 



25 



15. A strain according to claim 1 which has integrated a portion of DNA originating 
from Mycobacterium tuberculosis or any virulent member of the Mycobacterium 
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tuberculosis complex (M qfriccmum, M. bovis, M canettii), which comprises 
Rv3875 (SEQ ID No 18, ESAT-6). 

16. A strain according to claim 1 which has integrated a portion of DNA originating 
from Mycobacterium tuberculosis or any virulent member of the Mycobacterium 

5 tuberculosis complex (M qfricamim, M bovis, M. canettii), which comprises 

Rv3874 (SEQ ID No 17, CFP-10). 

17. A strain according to claim 1 which has integrated a portion of DNA originating 
from Mycobacterium tuberculosis or any virulent member of the Mycobacterium 
tuberculosis complex (M qfricanum, M bovis, M. canettii), which comprises both 

10 Rv3875 (SEQ ID No 18, ESAT-6) and Rv3874 (SEQ ID No 17, CFP-10). 

18. A strain according to one of claims 4 to 17, wherein the coding sequence of the 
integrated gene is in frame with its natural promoter or with an exogenous promoter, 
such as a promoter capable of directing high level of expression of said coding 
sequence. 

15 19. A strain according to one of claims 4 to 17, wherein the said integrated gene is 
mutated so as to maintain the improved immunogenicity while decreasing the 
virulence of the strain. 

20. A strain according to claim 18 or 19, wherein said strain only carries parts of the 
20 genes coding for ESAT-6 or CFP-10 in a mycobacterial expression vector under the 

control of a promoter, more particularly an hsp60 promoter. 

21. A strain according to claim 18, wherein said strain carries at least one portion of the 
esat-6 gene that codes for immunogenic 20-mer peptides of ESAT-6 active as T-cell 

25 epitopes. 
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22. A strain according to claim 19, wherein the esat-6 encoding gene is altered by 
directed mutagenesis in a way that most of the immunogenic peptides of ESAT-6 
remain intact, but the biological functionality of ESAT-6 is lost. 

5 

23. A strain according to claim 19, wherein the CFP-10 encoding gene is altered by 
directed mutagenesis in a way that most of the immunogenic peptides of CFP-10 
remain intact, but the biological functionality of CFP-10 is lost 

10 24. M bovis BCG::RD1 strains which have integrated a cosmid herein referred to as 
RD1-2F9 and RD1-AP34 contained in the E. coli strains deposited at the CNCM 
under the accession number 1-283 1 and 1-2832 respectively. 

25. M. bovis BCG::RD1 strain which has integrated the insert of the cosmid RD1-AP34 
15 which corresponds to the 3909 bp fragment of the M. tuberculosis H37Rv genome 

from region 4350459 bp to 4354367 bp cloned as shown in SEQ ID No 3. 

26. M bovis BCG::RD1 strain which has integrated the insert of the cosmid RD1-2F9 
(~ 32 kb) that covers the region of the M tuberculosis genome AL123456 from ca 

20 4337 kb to ca. 4369 kb as shown in SEQ ID No 1. 

27. M microtiiiKDl strain which has integrated the insert of the cosmid RD1-AP34 
which corresponds to the 3909 bp fragment of the A£ tuberculosis H37Rv genome 
from region 4350459 bp to 4354367 bp cloned as shown in SEQ ID No 3). 

25 

28. M microtiiiRDl strain which has integrated the insert of the cosmid RD1-2F9 (~ 32 
kb) that covers the region of the M. tuberculosis genome AL 123456 from ca 4337 
kb to ca. 4369 kb as shown in SEQ ID No 1. 
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29. A method for preparing and selecting improved M bovis BCG or M microti strains 
defined in any one of claims 1 to 28 comprising a step consisting of modifying said 
strains by insertion, deletion or mutation in the integrated portion of DNA 
originating from Mycobacterium tuberculosis or any virulent member of the 

5 Mycobacterium tuberculosis complex (M qfricanum, M bovis, M. canettii), more 

particularly in the esat-6 or CFP-IO gene, said method leading to strains that are less 
virulent for immuno-depressed individuals. 

30. A cosmid or a plasmid comprising a portion of DNA originating from 
Mycobacterium tuberculosis or any virulent member of the Mycobacterium 

10 tuberculosis complex (M afiicanum, M bovis, M. canettii), said portion of DNA 

comprising at least one, two, three or more gene(s) selected from Rv3861 (SEQ ID 
No 4), Rv3862 (SEQ ID No 5), Rv3863 (SEQ ID No 6), Rv3864 (SEQ ID No 7), 
Rv3865 (SEQ ID No 8), Rv3866 (SEQ ID No 9), Rv3867 (SEQ ID No 10), Rv3868 
(SEQ ID No 11), Rv3869 (SEQ ID No 12), Rv3870 (SEQ ID No 13), Rv3871 (SEQ 

15 ID No 14), Rv3872 (SEQ ID No 15, mycobacterial PE), Rv3873 (SEQ ID No 16, 

PPE), Rv3874 (SEQ ID No 17, CFP-10), Rv3875 (SEQ ID No 18, ESAT-6), 
Rv3876 (SEQ ID No 19), Rv3877 (SEQ ID No 20), Rv3878 (SEQ ID No 21), 
Rv3879 (SEQ ID No 22), Rv3880 (SEQ ID No 23), Rv3881 (SEQ ID No 24), 
Rv3882 (SEQ ID No 25), Rv3883 (SEQ ID No 26), Rv3884 (SEQ ID No 27) and 

20 Rv3885 (SEQ ID No 28). 

31. A cosmid or a plasmid comprising a portion of DNA originating from 
Mycobacterium tuberculosis or any virulent member of the Mycobacterium 
tuberculosis complex (M qfricanum, M. bovis, M. canettii), said portion of DNA 
comprising at least one, two, three or more gene(s) selected from Rv3867 (SEQ ID 

25 No 10), Rv3868 (SEQ ID No 11), Rv3869 (SEQ ID No 12), Rv3870 (SEQ ID No 

13), Rv3871 (SEQ ID No 14), Rv3872 (SEQ ID No 15, mycobacterial PE), Rv3873 
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(SEQ ID No 16, PPE), Rv3874 (SEQ ID No 17, CFP-10), Rv3875 (SEQ ID No 18, 
ESAT-6), Rv3876 (SEQ ID No 19) and Rv3877 (SEQ ID No 20). 

32. A cosmid or a plasmid comprising a portion of DNA originating from 
Mycobacterium tuberculosis or any virulent member of the Mycobacterium 

5 tuberculosis complex (M. qfricanum, M. bovis, M. canettii), said portion of DNA 

comprising at least one gene selected from Rv3871 (SEQ ID No 14), Rv3872 (SEQ 
ID No 15, mycobacterial PE), Rv3873 (SEQ ID No 16, PPE), Rv3874 (SEQ ID No 
17, CFP-10), Rv3875 (SEQ ID No 18, ESAT-6) and Rv3876 (SEQ ID No 12). 

33. A cosmid or a plasmid according to any of claims 30 to 32 comprising Rv3874 
10 encoding CFP-10, Rv3875 encoding ESAT-6 or both or a part of them. 

34. A cosmid or a plasmid according to any of claims 30 to 33 comprising a mutated 
gene selected among Rv3861 to Rv3885. 

35. A cosmid or a plasmid according to claim 30 which comprises at least four genes 
selected from Rv3861 (SEQ ID No 4), Rv3862 (SEQ ID No 5), Rv3863 (SEQ ID 

15 No 6), Rv3864 (SEQ ID No 7), Rv3865 (SEQ ID No 8), Rv3866 (SEQ ID No 9), 

Rv3867 (SEQ ID No 10), Rv3868 (SEQ ID No 11), Rv3869 (SEQ ID No 12), 
Rv3870 (SEQ ID No 13), Rv3871 (SEQ ID No 14), Rv3872 (SEQ ID No 15, 
mycobacterial PE), Rv3873 (SEQ ID No 16, PPE), Rv3874 (SEQ ID No 17, CFP- 
10), Rv3875 (SEQ ID No 18, ESAT-6), Rv3876 (SEQ ID No 19), Rv3877 (SEQ ID 

to No 20), Rv3878 (SEQ ID No 21), Rv3879 (SEQ ID No 22), Rv3880 (SEQ ID No 

23), Rv3881 (SEQ ID No 24), Rv3882 (SEQ ID No 25), Rv3883 (SEQ ID No 26), 
Rv3884 (SEQ ID No 27) and Rv3885 (SEQ ID No 28), provided that it comprises 
Rv3874 (SEQ ID No 17, CFP-10) and/or Rv3875 (SEQ ID No 18, ESAT-6)). 

36. A cosmid or a plasmid according to claim 30 which comprises at least Rv3871 
5 (SEQ ID No 14), Rv3875 (SEQ ID No 18, ESAT-6) and Rv3876 (SEQ ID No 19)). 
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37. A cosmid or a plasmid according to claim 30 which comprises at least Rv3871 
(SEQ ID No 14), Rv3875 (SEQ ID No 18, ESAT-6) and Rv3877 (SEQ ID No 20). 

38. A cosmid or a plasmid according to claim 30 comprising at least Rv3871 (SEQ ID 
No 14), Rv3875 (SEQ ID No 18, ESAT-6), Rv3876 (SEQ ID No 19) and Rv3877 

5 (SEQ ID No 20). 

39. A cosmid or a plasmid according to one of claims 36 to 38 which further comprises 
Rv3872 (SEQ ID No 15, mycobacterial PE) Rv3873 (SEQ ID No 16, PPE) Rv3874 
(SEQ ID No 17, CFP-10). 

40. A cosmid or a plasmid according to one of claims 36 to 38 which further comprises 
10 at least one, two, three or four gene(s) selected from Rv3861 (SEQ' ID No 4), 

Rv3862 (SEQ ID No 5), Rv3863 (SEQ ID No 6), Rv3864 (SEQ ID No 7), Rv3865 
(SEQ ID No 8), Rv3866 (SEQ ID No 9), Rv3867 (SEQ ID No 10), Rv3868 (SEQ 
ID No 11), Rv3869 (SEQ ID No 12), Rv3870 (SEQ ID No 13), Rv3878 (SEQ ID 
No 21), Rv3879 (SEQ ID No 22), Rv3880 (SEQ ID No 23), Rv3881 (SEQ ID No 
15 24), Rv3882 (SEQ ID No 25), Rv3883 (SEQ ID No 26), Rv3884 (SEQ ID No 27) 

and Rv3885 (SEQ ID No 28). 



41. A cosmid herein referred as RD1-2F9 and RD1-AP34 contained in the E. coli strains 
deposited at the CNCM under the accession number 1-2831 and 1-2832 respectively. 

20 42. Use of a cosmid or a plasmid according to one of claims 30 to 41 for transforming 
M bovis BCG or M. microti. 

43. A pharmaceutical composition comprising a strain according to one of claims 1 to 
27 and a pharmaceutical^ acceptable carrier. 
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44. A pharmaceutical composition according to claim 40 containing suitable 
pharmaceutically-acceptable carriers comprising excipierits and auxiliaries which 
facilitate processing of the living vaccine into preparations which can be used 
pharmaceutical 

5 45. A pharmaceutical composition according to claim 40 or 41 which is suitable for 
intravenous or subcutaneous administration. 

46. A vaccine comprising a strain according to one of claims 1 to 28 and a suitable 
carrier. 

47. A product comprising a strain according to one of claims 1 to 28 and at least one 
10 protein selected from ESAT-6 and CFP-10 or epitope derived thereof for a separate, 

simultaneous or sequential use for treating tuberculosis. 

48. The use of a strain according to one of claims 1 to 28 for preparing a medicament or 
a vaccine for preventing or treating tuberculosis. 

49. The use of a strain according to one of claims 1 to 28 as an 
15 adjuvant/immunomodulator for preparing a medicament for the treatment of 

superficial bladder cancer. 

50. A method for the identification at the species level of members of the M 
tuberculosis complex by means of markers for RDl mic and RD5 mic as molecular 
diagnostic test. 

20 5 1 . A method according to claim 50 comprising the use of a primer selected from : 



primer esat-6F GTCACGTCCATTCATTCCCT (SEQ ID No 32), 
primer esat-6R ATCCCAGTGACGTTGCCTT) (SEQ ID No 33), 
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primer RDl roic flanking region F GCAGTGCAAAGGTGCAGATA (SEQ ID No 34), 
primer RDl mic flanking region R GATTGAGACACTTGCCACGA (SEQ ID No 35), 
primer RDS"" 0 flanking region F GAATGCCGACGTCATATCG (SEQ ID No 39), 
primer RD5 mic flanking region R CGGCCACTGAGTTCGATTAT (SEQ ID No 40) 
5 and the complementary sequences of said primers. 

52. A diagnostic kit for the identification at the species level of members of the M. 
tuberculosis complex comprising DNA probes and primers specifically hybridizing to a 
DNA portion of the RD1 or RD5 region of M. tuberculosis, more particularly probes 
hybridizing under stringent conditions to a gene selected from Rv3871 (SEQ ID No 14), 

10 Rv3872 (SEQ ID No 15, mycobacterial PE), Rv3873 (SEQ ID No 16, PPE), Rv3874 
(SEQ ID No 17, CFP-10), Rv3875 (SEQ ID No 18, ESAT-6), Rv3876 (SEQ ID No 19) 
and Rv3877 (SEQ ID No 20), preferably CFP-10 and ESAT-6. 

53. A diagnostic kit according to claim 52 comprising a probe or primer selected from : 

primer esat-6F GTCACGTCCATTCATTCCCT (SEQ ID No 32), 
15 primer esat-6R ATCCCAGTGACGTTGCCTT) (SEQ ID No 33), 

primer RDl mic flanking region F GCAGTGCAAAGGTGCAGATA (SEQ ID No 34), 
primer RDl mic flanking region R GATTGAGACACTTGCCACGA (SEQ ID No 35), 
primer RD5 mic flanking region F GAATGCCGACGTCATATCG (SEQ ID No 39), 
primer RD5 mic flanking region R CGGCCACTGAGTTCGATTAT (SEQ ID No 40) 
20 and the complementary sequences of said primers. 

54. A diagnostic kit for the identification at the species level of members of the M 
tuberculosis complex comprising at least one, two, three or more antibodies directed to 
mycobacterial PE, PPE, CFP-10, ESAT-6. 
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55. A diagnostic kit according to claim 54 wherein it comprises antibodies directed to 
CFP-10andESAT-6. 

56. Virulence markers associated with RD1 and/or RD5 regions of the genome of M 
tuberculosis or a part of these regions. 

5 

57. The use of a strain according to one of claims 1 to 28 as a carrier for the expression 
of a molecule or an heterologous antigen that are of therapeutic or prophylactic interest. 

58. A purified nucleic acid corresponding to the Mycobacteirum DNA inserted in a 
10 cosmid according to any of claims 30 to 41 . 

59. The purified nucleic acid according to claim 58 which corresponds to the insert of 
cosmid RD1-2F9 or cosmid RD1-AP34. 
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SEQUENCE LISTING 



<110> INSTITUT PASTEUR 



<120> Identification of virulence associated regions RD1 and 
RD5 leading to improve vaccine of M. bovis BCG and AT. 
microti 



<130> D20217 



<150> EP 02/290864 
<151> 2002-04-05 



<160> 75 



<170> Patentln Ver. 2.1 



<210> 1 
<211> 31808 
<212> DNA 

<213> Mycobacterium tuberculosis 



<220> 

<223> Insert of cosmid RD1-2F9 corresponding to sequence 
in the genome of micybacterium tuberculosis H37Rv 



<400> 1 

gatccgacca caccagcccg gcaccccgcg 
tcgctgcccc gcgccaccgc ctggccggca 
gcgcctgcct tcagctccac tgcgtccatt 
ggccccgcca ccgagcacct gggccgaccc 
acccggcgag cgtggttggc gacgcatggt 
gggccggtcg ggcatgcagc gccaggaggc 
gcatggcaac cacaaggtcg ccgtgctggg 
ggcgtgcgtc ggatcgatcc ttgccgaact 
cgccgacacc gccttcggca ggctgagcag 
ctgggagctg accaccgaca cgaatctgcg 
ccgaaattcc gcgggactgt acgtcctggc 
gctcgatccg gccatctacc gcgaagccgc 
ggtgatcgac tgcggttcct ccatggaggc 
ggatgctctg atcgtggtgt cctcgccctg 
catcgaatgg ctgtcggatt atggcctgac 
caacgattcg gacggacacg ccgacaagcg 
cgaccacggg cagcctgtgg tcgaggtgcc 
catcgatatg agccacgaaa tggccccgac 
gacggtgacg gcgtacttcg cgtcgcgacc 
acctggctgg ctgacccggt cggcaacagc 
tcaatctcgg cgcccatcgt cgaatcctgg 
cgcgagaaat cttgtcgatg ttctcgcgct 
ttccgcagac ccctcgaacc agcggtccag 
cacccggtgg tcgcgcaccg ggtaaccgtt 
cgcgaatgcc cgcgcccggc ctgattcggg 
acccgcggac tcgacggcgt cgcgtgcaca 
ggtcttggcc tcgtcgtcgg gagtcgtcgt 
gagcgggacc tcatacaggg cggttactgt 
ggttgtagcc tctgccgcga aagcgtatcg 
ctctgcctgc ctagcggtgc tgcggctccg 
accgccgagt accagggcat agatcctgtt 
gtatccgacc ccttcgggca gatcttccag 
tgtgaagtga actgtagcgg cagttcggtt 
tcgcgtcgct agatccaaaa tgtagcgaag 



gggatactcg ccccgtccgc cgtccggaga 60 
cgctgctgcc gctacgccac cagggccgcc 120 
gccggacccg gcttggccac gccagccgga 180 
cgccctggcg ccgatacgca gtcggacgcg 240 
gcggctggtc acctttggcc ttgtcggcct 300 
ccaattcgaa gcaacgatac gaaccgtcct 360 
caaaggaggt gtgggaaaga cgtcggttgc 42 0 
gcgccagcag gaccgtatcg tcgggatcga 480 
ccgaatcgat cctcgagcag ctggttcgtt 540 
gtccttcacc gatatcaccg cgcgcctggg 600 
aggccagccg gcatccggtc cgcgccgggt 660 
cctaaggttg gatcaccatt tcgcaatctc 72 0 
ggcggtcacc caggaagtat tgcgcgatgt 780 
ggcggatggt gcctccgctg ccgccaacac 840 
aggtttgttg cgacgcagca tcgtggtgct 900 
caccaagtca ttgctggccc aggaattcat 960 
cttcgatccc catttgcggc ccgggggggt 1020 
gacgcggctg aaaatcctgc aggtcgccgc 1080 
cgccgacgca cacggcagcc cgccccggtg 1140 
aggatcgccc gagcgcaggc ctgcaaaacg 1200 
cgggcgcaac gcggcgcgca atgtggacag 1260 
gtccacatcc agggcatctc accgccactg 1320 
gcggcggttg cgtcatgccg attgggcaga 1380 
gcgctcggcc agggatcgca gctggcccaa 1440 
aattacgacc cctgcccaca gcccttccgc 1500 
cagccaccgg cgcgggcaag cccggcacag 1560 
ccaacgatcg ggatcttgcg tgcaaacgcc 1620 
catgtctacg ttcctccaga aagcgttgca 1680 
cattaaccat agcgatgcaa cagtttcctc 1740 
gttcggcgag ctccgagctc tagtgcgcgc 1800 
aatcagctgt gtatctggcc tcgccggcgc 1860 
gaaaagtgtt ctgacatgcg acagttcagg 1920 
tggctaggaa actatttcca tagcgggccg 1980 
tcatagcagt agaagggtgc aacggttagg 2040 
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atggcgggcg agcggaaagt ctgcccaccg tcccggctag tacccgcgaa taagggatca 2100 
acgcagatgt ctaaagcagg gtcgactgtc ggaccggcgc cgctggtcgc gtgcagcggc 2160 
ggcacatcag acgtgattga gccccgtcgc ggtgtcgcga tcattggcca ctcgtgccga 2220 
gtcggcaccc agatcgacga ttctcgaatc tctcagacac atctgcgagc ggtatccgat 2280 
gatggacggt ggcggatcgt cggcaacatc ccgagaggta tgttcgtcgg cggacgacgc 2340 
ggcagctcgg tgaccgtcag cgataagacc ctaatccgat tcggcgatcc ccctggaggc 2400 
aaggcgttga cgttcgaagt cgtcaggccg tcggattccg ctgcacagca cggccgcgta 2460 
caaccatcag cggacctgtc ggacgacccg gcgcacaacg ctgcgccggt cgcaccggac 2520 
cccggcgtgg ttcgcgcagg ggcggccgcg gctgcgcgcc gtcgtgaact tgacatcagc 2580 
caacgcagct tggcggccga cgggatcatc aacgcgggcg cgctcatcgc gttcgagaaa 2640 
ggccgtagtt ggccccggga acggacccgg gcaaaactcg aagaagtgct gcagtggccc 2700 
gctggaacca tcgcgcgaat ccgtcggggc gagcccaccg agcccgcaac aaaccccgac 2760 
gcgtcccccg gactccggcc tgccgacggc ccggcgtcct tgatcgcgca ggctgtcacc 2 820 
gccgccgtag acggctgcag tctggctatc gcagcgttgc cggcgaccga ggaccccgag 2880 
ttcaccgaac gtgccgcgcc gatccttgct gatttgcgcc agctcgaggc gattgccgtc 2940 
caagcaaccc gcatcagccg gattaccccg gaattgatca aggcgttggg cgcggtacgt 3000 
cgccaccacg acgaattaat gaggctggga gcaaccgccc ctggtgccac actggcgcag 3060 
cgcttatatg ccgcacggcg gcgcgcgaac ctttccaccc tggagactgc ccaagcggcc 3120 
ggcgtcgcag aagaaatgat cgtcggcgcc gaagccgagg aagagttgcc agccgaggcc 3180 
accgaagcga tcgaagcact gatccgtcag atcaattgag gtcggctccg agcgtcccac 3240 
aagtacaggc acgccgtaac gctcaagttc aacggtccgg ggaacgcgcg cgttctccgg 3300 
cgtttgacgg tgcgttccat cgtgccgcga acttgaaaac gccagcgtca ccaaaaaatt 3360 
cgtgcaccaa cccccctccg agcgctgcta agctcaatgt gcagtgcaaa ggtgcagata 3420 
atgatggcgc accggaacgg cgagcgtaag gaaacacata aatggcatcg ggtagcggtc 3480 
tttgcaagac gacgagtaac tttatttggg gccagttact cttgcttgga gagggaatcc 3540 
ccgacccagg cgacattttc aacaccggtt cgtcgctgtt caaacaaatc agcgacaaaa 3600 
tgggactcgc cattccgggc accaactgga tcggccaagc ggcggaagct tacctaaacc 3660 
agaacatcgc gcaacaactt cgcgcacagg tgatgggcga tctcgacaaa ttaaccggca 3720 
acatgatctc gaatcaggcc aaatacgtct ccgatacgcg cgacgtcctg cgggccatga 3780 
agaagatgat tgacggtgtc tacaaggttt gtaagggcct cgaaaagatt ccgctgctcg 3 840 
gccacttgtg gtcgtgggag ctcgcaatcc ctatgtccgg catcgcgatg gccgttgtcg 3900 
gcggcgcatt gctctatcta acgattatga cgctgatgaa tgcgaccaac ctgaggggaa 3960 
ttctcggcag gctgatcgag atgttgacga ccttgccaaa gttccccggc ctgcccgggt 4020 
tgcccagcct gcccgacatc atcgacggcc tctggccgcc gaagttgccc gacattccga 4080 
tccccggcct gcccgacatc ccgggcctac ccgacttcaa atggccgccc acccccggca 4140 
gcccgttgtt ccccgacctc ccgtcgttcc cagggttccc cgggttcccg gagttccccg 4200 
ccatccccgg gttccccgca ctgcccgggt tgcccagcat tcccaacttg ttccccggct 4260 
tgccgggtct gggcgacctg ctgcccggcg taggcgattt gggcaagtta cccacctgga 4320 
ctgagctggc cgctttgcct gacttcttgg gcggcttcgc cggcctgccc agcttgggtt 4380 
ttggcaatct gctcagcttt gccagtttgc ccaccgtggg tcaggtgacc gccaccatgg 4440 
gtcagctgca acagctcgtg gcggccggcg gtggccccag ccaactggcc agcatgggca 4500 
gccaacaagc gcaactgatc tcgtcgcagg cccagcaagg aggccagcag cacgccaccc 4560 
tcgtgagcga caagaaggaa gacgaggaag gcgtggccga ggcggagcgt gcacccatcg 4620 
acgctggcac cgcggccagc caacgggggc aggaggggac cgtcctttga tcggacaccg 4680 
agtcgccagc aggtctgtgc catagcgagt cgaagccata gcgagtagaa agttaaacgt 4740 
agaggagggt tcaacccatg accggatttc tcggtgtcgt gccttcgttc ctgaaggtgc 4800 
tggcgggcat gcacaacgag atcgtgggtg atatcaaaag ggcgaccgat acggtcgccg 4860 
ggattagcgg acgagttcag cttacccatg gttcgttcac gtcgaaattc aatgacacgc 4920 
tgcaagagtt tgagaccacc cgtagcagca cgggcacggg tttgcaggga gtcaccagcg 4980 
gactggccaa taatctgctc gcagccgccg gcgcctacct caaggccgac gatggcctag 5040 
ccggtgttat cgacaagatt ttcggttgat catgacgggt ccgtccgctg caggccgcgc 5100 
gggcaccgcc gacaacgtgg tcggcgtcga ggtaaccatc gacggcatgt tggtgatcgc 5160 
cgatcggtta cacctggttg atttccctgt cacgcttggg attcggccga atatcccgca 5220 
agaggatctg cgagacatcg tctgggaaca ggtgcagcgt gacctcacag cgcaaggggt 5280 
gctcgacctc cacggggagc cccaaccgac ggtcgcggag atggtcgaaa ccctgggcag 5340 
gccagatcgg accttggagg gtcgctggtg gcggcgcgac attggcggcg tcatggtgcg 5400 
cttcgtcgtg tgccgcaggg gcgaccgcca tgtgatcgcg gcgcgcgacg gcgacatgct 5460 
ggtgctgcag ttggtggcgc cgcaggtcgg cttggcgggc atggtgacag cggtgctggg 5520 
gcccgccgaa cccgccaacg tcgaacccct gacgggtgtg gcaaccgagc tagccgaatg 5580 
cacaaccgcg tcccaattga cgcaatacgg tatcgcaccg gcctcggccc gcgtctatgc 5640 
cgagatcgtg ggtaacccga ccggctgggt ggagatcgtt gccagccaac gccaccccgg 5700 
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cggcaccacg acgcagaccg acgccgccgc tggcgtcctg gactccaagc tcggtaggct 5760 
ggtgtcgctt ccccgccgtg ttggaggcga cctgtacgga agcttcctgc ccggcactca 5820 
gcagaacttg gagcgtgcgc tggacggctt gctagagctg ctccctgcgg gcgcttggct 5880 
agatcacacc tcagatcacg cacaagcctc ctcccgaggc tgacccctca catctccgct 5940 
acgacttcag aaagggacgc catggtggac ccgccgggca acgacgacga ccacggtgat 6000 
ctcgacgccc tcgatttctc cgccgcccac accaacgagg cgtcgccgct ggacgcctta 6060 
gacgactatg cgccggtgca gaccgatgac gccgaaggcg acctggacgc cctccatgcg 6120 
ctcaccgaac gcgacgagga gccggagctg gagttgttca cggtgaccaa ccctcaaggg 6180 
tcggtgtcgg tctcaaccct gatggacggc agaatccagc acgtcgagct gacggacaag 6240 
gcgaccagca tgtccgaagc gcagctggcc gacgagatct tcgttattgc cgatctggcc 6300 
cgccaaaagg cgcgggcgtc gcagtacacg ttcatggtgg agaacatcgg tgaactgacc 6360 
gacgaagacg cagaaggcag cgccctgctg cgggaattcg tggggatgac cctgaatctg 6420 
ccgacgccgg aagaggctgc cgcagccgaa gccgaagtgt tcgccacccg ctacgatgtc 6480 
gactacacct cccggtacaa ggccgatgac tgatcgcttg gccagtctgt tcgaaagcgc 6540 
cgtcagcatg ttgccgatgt cggaggcgcg gtcgctagat ctgttcaccg agatcaccaa 6600 
ctacgacgaa tccgcttgcg acgcatggat cggccggatc cggtgtgggg acaccgaccg 6660 
ggtgacgctg tttcgcgcct ggtattcgcg ccgcaatttc ggacagttgt cgggatcggt 6720 
ccagatctcg atgagcacgt taaacgccag gattgccatc ggggggctgt acggcgatat 6780 
cacctacccg gtcacctcgc cgctagcgat caccatgggc tttgccgcat gcgaggcagc 6840 
gcaaggcaat tacgccgacg ccatggaggc cttagaggcc gccccggtcg cgggttccga 6900 
gcacctggtg gcgtggatga aggcggttgt ctacggcgcg gccgaacgct ggaccgacgt 6960 
gatcgaccag gtcaagagtg ctgggaaatg gccggacaag tttttggccg gcgcggccgg 7020 
tgtggcgcac ggggttgccg cggcaaacct ggccttgttc accgaagccg aacgccgact 7080 
caccgaggcc aacgactcgc ccgccggtga ggcgtgtgcg cgcgccatcg cctggtatct 7140 
ggcgatggca cggcgcagcc agggcaacga aagcgccgcg gtggcgctgc tggaatggtt 7200 
acagaccact caccccgagc ccaaagtggc tgcggcgctg aaggatccct cctaccggct 7260 
gaagacgacc accgccgaac agatcgcatc ccgcgccgat ccctgggatc cgggcagtgt 7320 
cgtgaccgac aactccggcc gggagcggct gctcgccgag gcccaagccg aactcgaccg 7380 
ccaaattggg ctcacccggg ttaaaaatca gattgaacgc taccgcgcgg cgacgctgat 7440 
ggcccgggtc cgcgccgcca agggtatgaa ggtcgcccag cccagcaagc acatgatctt 7500 
caccggaccg cccggtaccg gcaagaccac gatcgcgcgg gtggtggcca atatcctggc 7560 
cggcttaggc gtcattgccg aacccaaact cgtcgagacg tcgcgcaagg acttcgtcgc 7620 
cgagtacgag gggcaatcgg cggtcaagac cgctaagacg atcgatcagg cgctgggcgg 7680 
ggtgcttttc atcgacgagg cttatgcgct ggtgcaggaa agagacggcc gcaccgatcc 7740 
gttcggtcaa gaggcgctgg acacgctgct ggcgcggatg gagaacgacc gggaccggct 7800 
ggtggtgatc atcgccgggt acagctccga catagatcgg ctgctggaaa ccaacgaggg 7860 
tctgcggtcg cggttcgcca ctcgcatcga gttcgacacc tattcccccg aggaactcct 7920 
cgagatcgcc aacgtcattg ccgctgctga tgattcggcg ttgaccgcag aggcggccga 7980 
gaactttctt caggccgcca agcagttgga gcagcgcatg ttgcgcggcc ggcgcgccct 8040 
ggacgtcgcc ggcaacggtc ggtatgcgcg ccagctggtg gaggccagcg agcaatgccg 8100 
ggacatgcgt ctagcccagg tcctcgatat cgacaccctc gacgaagacc ggcttcgcga 8160 
gatcaacggc tcagatatgg cggaggctat cgccgcggtg cacgcacacc tcaacatgag 8220 
agaatgaact atggggcttc gcctcaccac caaggttcag gttagcggct ggcgttttct 8280 
gctgcgccgg ctcgaacacg ccatcgtgcg ccgggacacc cggatgtttg acgacccgct 8340 
gcagttctac agccgctcga tcgctcttgg catcgtcgtc gcggtcctga ttctggcggg 8400 
tgccgcgctg ctggcgtact tcaaaccaca aggcaaactc ggcggcacca gcctgttcac 8460 
cgaccgcgcg accaaccagc tttacgtgct gctgtccgga cagttgcatc cggtctacaa 8520 
cctgacttcg gcgcggctgg tgctgggcaa tccggccaac ccggccaccg tgaagtcctc 8580 
cgaactgagc aagctgccga tgggccagac cgttggaatc cccggcgccc cctacgccac 8640 
gcctgtttcg gcgggcagca cctcgatctg gaccctatgc gacaccgtcg cccgagccga 8700 
ctccacttcc ccggtagtgc agaccgcggt catcgcgatg ccgttggaga tcgatgcttc 8760 
gatcgatccg ctccagtcac acgaagcggt gctggtgtcc taccagggcg aaacctggat 8820 
cgtcacaact aagggacgcc acgccataga tctgaccgac cgcgccctca cctcgtcgat 8880 
ggggataccg gtgacggcca ggccaacccc gatctcggag ggcatgttca acgcgctgcc 8940 
tgatatgggg ccctggcagc tgccgccgat accggcggcg ggcgcgccca attcgcttgg 9000 
cctacctgat gatctagtga tcggatcggt cttccagatc cacaccgaca agggcccgca 9060 
atactatgtg gtgctgcccg acggcatcgc gcaggtcaac gcgacaaccg ctgcggcgct 9120 
gcgcgccacc caggcgcacg ggctggtcgc gccaccggca atggtgccca gtctggtcgt 9180 
cagaatcgcc gaacgggtat acccctcacc gctacccgat gaaccgctca agatcgtgtc 9240 
ccggccgcag gatcccgcgc tgtgctggtc atggcaacgc agcgccggcg accagtcgcc 9300 
gcagtcaacg gtgctgtccg gccggcatct gccgatatcg ccctcagcga tgaacatggg 9360 
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gatcaagcag atccacggga cggcgaccgt ttacctcgac ggcggaaaat tcgtggcact 9420 
gcaatccccc gatcctcgat acaccgaatc gatgtactac atcgatccac agggcgtgcg 9480 
ttatggggtg cctaacgcgg agacagccaa gtcgctgggc ctgagttcac cccaaaacgc 9540 
gccctgggag atcgttcgtc tcctggtcga cggtccggtg ctgtcgaaag atgccgcact 9600 
gctcgagcac gacacgctgc ccgctgaccc tagcccccga aaagttcccg ccggagcctc 9660 
cggagccccc tgatgacgac caagaagttc actcccacca ttacccgtgg cccccggttg 9720 
accccgggcg agatcagcct cacgccgccc gatgacctgg gcatcgacat cccaccgtcg 9780 
ggcgtccaaa agatccttcc ctacgtgatg ggtggcgcca tgctcggcat gatcgccatc 9840 
atggtggccg gcggcaccag gcagctgtcg ccgtacatgt tgatgatgcc gctgatgatg 9900 
atcgtgatga tggtcggcgg tctggccggt agcaccggtg gtggcggcaa gaaggtgccc 9960 
gaaatcaacg ccgaccgcaa ggagtacctg cggtatttgg caggactacg cacccgagtg 10020 
acgtcctcgg ccacctctca ggtggcgttc ttctcctacc acgcaccgca tcccgaggat 10080 
ctgttgtcga tcgtcggcac ccaacggcag tggtcccggc cggccaacgc cgacttctat 10140 
gcggccaccc gaatcggtat cggtgaccag ccggcggtgg atcgattatt gaagccggcc 10200 
gtcggcgggg agttggccgc cgccagcgca gcacctcagc cgttcctgga gccggtcagt 10260 
catatgtggg tggtcaagtt tctacgaacc catggattga tccatgactg cccgaaactg 10320 
ctgcaactcc gtacctttcc gactatcgcg atcggcgggg acttggcggg ggcagccggc 10380 
ctgatgacgg cgatgatctg tcacctagcc gtgttccacc caccggacct gctgcagatc 10440 
cgggtgctca ccgaggaacc cgacgacccc gactggtcct ggctcaaatg gcttccgcac 10500 
gtacagcacc agaccgaaac cgatgcggcc gggtccaccc ggctgatctt cacgcgccag 10560 
gaaggtctgt cggacctggc cgcgcgcggg ccacacgcac ccgattcgct tcccggcggc 10620 
ccctacgtag tcgtcgtcga cctgaccggc ggcaaggctg gattcccgcc cgacggtagg 10680 
gccggtgtca cggtgatcac gttgggcaac catcgcggct cggcctaccg catcagggtg 10740 
cacgaggatg ggacggctga tgaccggctc cctaaccaat cgtttcgcca ggtgacatcg 10800 
gtcaccgatc ggatgtcgcc gcagcaagcc agccgtatcg cgcgaaagtt ggccggatgg 10860 
tccatcacgg gcaccatcct cgacaagacg tcgcgggtcc agaagaaggt ggccaccgac 10920 
tggcaccagc tggtcggtgc gcaaagtgtc gaggagataa caccttcccg ctggaggatg 10980 
tacaccgaca ccgaccgtga ccggctaaag atcccgtttg gtcatgaact aaagaccggc 11040 
aacgtcatgt acctggacat caaagagggc gcggaattcg gcgccggacc gcacggcatg 11100 
ctcatcggga ccacggggtc tgggaagtcc gaattcctgc gcaccctgat cctgtcgctg 11160 
gtggcaatga ctcatccaga tcaggtgaat ctcctgctca ccgacttcaa aggtggttca 11220 
accttcctgg gaatggaaaa gcttccgcac actgccgctg tcgtcaccaa catggccgag 11280 
gaagccgagc tcgtcagccg gatgggcgag gtgttgaccg gagaactcga tcggcgccag 11340 
tcgatcctcc gacaggccgg gatgaaagtc ggcgcggccg gagccctgtc cggcgtggcc 11400 
gaatacgaga agtaccgcga acgcggtgcc gacctacccc cgctgccaac gcttttcgtc 11460 
gtcgtcgacg agttcgccga gctgttgcag agtcacccgg acttcatcgg gctgttcgac 11520 
cggatctgcc gcgtcgggcg gtcgctgagg gtccatctgc tgctggctac ccagtcgctg 11580 
cagaccggcg gtgttcgcat cgacaaactg gagccaaacc tgacatatcg aatcgcattg 11640 
cgcaccacca gctctcatga atccaaggcg gtaatcggca caccggaggc gcagtacatc 11700 
accaacaagg agagcggtgt cgggtttctc cgggtcggca tggaagaccc ggtcaagttc 11760 
agcaccttct acatcagtgg gccatacatg ccgccggcgg caggcgtcga aaccaatggt 11820 
gaagccggag ggcccggtca acagaccact agacaagccg cgcgcattca caggttcacc 11880 
gcggcaccgg I?ctcglgga ggcgccgaca ccgtgacccg cgccggcgac gatgcaaagc 11940 
icaicgatga ggaggagcgg cgccaacggc ccgcgccggc gacgatgcaa agcgcagcga "000 
Sgaggaggag cggcgcgcat gactgctgaa ccggaagtac ggacgctgcg <^ggttgtg "060 
c?glaccagc tcggcactgc tgaatcgcgt gcgtacaaga tgtggctgcc 3<=c9"gacc 12120 
aatccggtcc cgctcaacga gctcatcgcc cgtgatcggc gacaacccct gcgatttgcc 12180 
ctggggatca tggatgaacc gcgccgccat ctacaggatg tgtggggcgt agacgtttcc 12240 
ggSIcggcg gcaacatcgg tattgggggc gcacctcaaa ccgggaagtc gacgctactg 12300 
caglcgalgg tgatgtcggc cgccgccaca cactcaccgc gcaacgttca g^attgc 12360 
atcgacctag gtggcggcgg gctgatctat ctcgaaaacc ttccacacgt ^ggtggggta 12420 
gccaatcggt ccgagcccga caaggtcaac cgggtggtcg cagagatgca agccgtcatg 12480 
cggcaacggg aaaccacctt caaggaacac cgagtgggct cgatcgggat Staccggcag 12540 
c?Icgtgac| atccaagtca acccgttgcg tccgatccat acggcgacgt ctttctgatc 12600 
atcglcggat ggcccggttt tgtcggcgag ttccccgacc ttgaggggca 99ttcaagat 12660 
ctglcciccc aggggctggc gttcggcgtc cacgtcatca tctccacgcc acgctggaca 12720 
gailtglagt cgcgtgttcg cgactacctc ggcaccaaga tcgagttccg 9cttggtgac 12780 
Itcaatgaaa cccagatcga ccggattacc cgcgagatcc cggcgaatcg tccgggtcgg 12840 
gcagtgtcga tggaaaagca ccatctgatg atcggcgtgc ccaggttcga ^ggcgtgcac 12900 
agcgccgala acctggtgga ggcgatcacc gcgggggtga cgcagatcgc "cccagcac 12960 
accgaacagg cacctccggt gcgggtcctg ccggagcgta tccacctgca cgaactcgac 13020 
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ccgaacccgc 
cgcgagacgg 
ttcggtgcgg 
gcccgaaaca 
gacgcggtgc 
ctagacgagg 
ctgacgacgg 
gtcgacgatt 
gccccgttat 
agccaggctt 
ccgacaatgt 
cggcgccccc 
ccctacatcg 
ttcattgccg 
ccatcgggtt 
gggaagaagt 
gcaagtgagc 
gaccgggctg 
atcggagggc 
gggcgaagcg 
cgtcttcgcc 
aatgccaccg 
tgcggcggcc 
gaccgcgcgc 
tgcggctgca 
tgcgatgcag 
gctgccggag 
cggtatcaac 
ccaggcagcc 
gaagctcgag 
gatcttcgga 
tacccagacc 
gctgcagcag 
cgacgaggaa 
ggctggtgga 
cgcaggtggg 
. cccctcggtg 
gggtgcggga 
cgcgccggca 
ggacgactgg 
tgccaacatt 
accgatgccg 
aaaacccaga 
gcggcgggga 
aagcaggaac 
gccgacgagg 
aaaagaaacg 
cggcaagcgc 
agtccctgac 
tccagcaaaa 
ggacgatcag 
tcgcataggg 
ccttccgtcg 
catggcggcc 
tatggcagcg 
aaacctaccg 
gttcgtgtcg 
gccgatcgcc 
ccccatgccc 
catcgccgga 
acctgcaccc 



cgggaccaga 
acctgacgcc 
ccaaatcggg 
gtccccagca 
cggacaccca 
ccgttcaagc 
cgcagctacg 
ggcacatgat 
tgccggcggc 
acaaggcaac 
tcctttcggg 
ctggccaggc 
agcctccaga 
gtgtagcagg 
tgtttccggc 
aggcaaatgg 
gacaacgctc 
gttcccgcgg 
atccaattgc 
gtccaggacg 
gaataggccc 
gagctaaata 
gcgggatggc 
ctgaactctc 
acgccgatgg 
gcgacggcgc 
atcgccgcca 
acgatcccga 
ctggcaatgg 
ccgatggcgt 
atgccctccc 
ctcggccaac 
gtgacgtcgt 
gccgcgcaga 
tcaggcccca 
tcgttgaccc 
atgccggcgg 
gcgatgggcc 
ccgctcgcgc 
tgagctcccg 
ttggcgagga 
ctaccctcgc 
tcgaccaggt 
cggccgccca 
tcgacgagat 
age age age a 
gageaaaaae 
aatccaggga 
caagctcgca 
atgggacgcc 
egaagceggt 
caacgccgag 
gtctcgccct 
gactacgaca 
cagccgttct 
aagcccaacg 
gccccgccgc 
gcaggagagc 
atcgccggac 
cccgaaccgg 
accccaaccg 



gtccgactac 
ggctcactgc 
caagacgacc 
ggtgcggttc 
tetgetggge 
actggcggtc 
ctcgcgttcg 
cgtgggtgcc 
ggcagatatc 
catggacaag 
cgagaagcag 
atttctegtc 
agaagtgttc 
acccgagctc 
tataaccgaa 
aaaaaatgtc 
tgcacggcgt 
gggccgatga 
tggcttccaa 
tcgcccgcac 
eeaaeaeatc 



ccgcacggct 
agaegcttte 
tgggagaagc 
tggtctggct 
aagccgcggc 
accacatcac 
tegegttgae 
aggtctacca 
cgatccttga 
ctggcagctc 
tgggtgagat 
tgttcagcca 

tgggectget 
gcgcgggcgc 
gcacgccgct 
ctgctgccgg 
agggtgegea 
aggagcgtga 
taatgacaac 
aggtaaagag 
gcaggaggca 
ggagtcgacg 
ggccgcggtg 
ctcgacgaat 
ggcgctgtcc 
atgacagagc 
aatgtcacgt 
gcggcctggg 
acggctaccg 
caggcaatgg 
ttcgegtaga 
ttctcgtgtt 
agctcttccg 
tcgaccccag 
gccagactcc 
cgccaccccc 
cgccctcgcc 
ccgaaccggc 
ccccacccaa 
aatcccagtt 



cgcactcgct 
cacatgcaca 
attgcccacg 
atgetcgegg 
gccggcgcga 
aacctgaaga 
tggtggagcg 
geegggggga 
gggttgcaca 
ttcgtcggcg 
gaattcccat 
tcgccagacg 
gcagcacccc 
ageceggtaa 
cggtttgtgt 
acatgatccg 
gacggccggc 
ggtctccgcc 
tgcatcggcc 
etattegcaa 
ggagggagtg 
gatggcegge 

ggcggctctg 

ctggactgga 
acaaaccgcg 
atacacccag 
ccaggccgtc 
cgagatggat 
ggccgagacc 
tcccggcgcg 
aacaceggtt 
gagcggcccg 
ggtgggcggc 
cggcaccagt 
gggCCtgctg 
gatgtctcag 
ategteggeg 
atccggcggc 
agaagacgac 
agacttcccg 
agaaagtagt 
ggtaatttcg 
gcaggttcgt 
gtgcgcttcc 
attegtcagg 
tegcaaatgg 
agcagtggaa 
ccattcattc 
geggtagegg 
agctgaacaa 
cttcgaccga 
atagegaaac 
tatacgtttg 
gccgcacgaa 
tgcttcgttt 
gcccccgacg 
acccccacct 
ggaaccggcc 
cccacccaaa 
accacccaca 
ggcgcccccc 



gggagattcc 

cgaacccgca 
cgatcgcgcg 
actaccgctc 
tcaaccgcaa 
ageggttgee 
gatttgacgt 
tgccgccgat 
tcattgtcac 
ccgcattcgg 
ccagtgagtt 
gcaaagaggt 
caagcgccgg 
tegagttegg 
aegggataca 
atcgctgccg 
tcgacggcgc 
caageggega 
caagaccagc 
atcgacgacg 
atcaccatgc 
gegggtcegg 
gaegctcagg 
ggtggcagcg 
tcaacacagg 
gccatggcca 
cttacggcca 
tatttcatcc 



gcggttaaca 
agecagagea 
ggccagttgc 
atgeagcage 
accggcggcg 
ccgctgtcga 
cgcgcggagt 
ctgatcgaaa 
acgggtggcg 
tccaccaggc 
gaggacgact 
gccacccggg 
ccagcatggc 
ageggatetc 
tgeagggeca 
aagaageage 
ccggcgtcca 
gcttctgacc 
tttcgcgggt 
cctccttgac 
ttcggaggcg 
cgcgctgcag 
aggcaaegtc 
aegggategg 
agcgcactct 
ggtatggaag 
ccgccggcgc 
tccgacgacc 
ccgcctccgc 
gcatctaaac 
ccacccacac 
cctccgatgc 
agaccaccga 



gateggcttg 
cctactgatc 
cgccatttgt 
gggcctgctg 
cagcgcgtcg 
gccgaccgac 
cgtgcttctg 
ggcaccgctg 
ctgtcagatg 
gtegggeget 
caaggtcaag 
catccaggcc 
ttaagattat 
geaatgetga 
aatacaggga 
acattggcac 
tgacgtcggt 
cggcgttcac 
tccaccgtgc 
gcgccgccgg 
tgtggcacgc 
ctccaatgct 
ccgtcgagtt 
acaaggeget 
ccaagacccg 
cgacgccgtc 
ccaacttctt 
gtatgtggaa 
egcttttega 
cgacgaaccc 
cgccggcggc 
tgacccagcc 
gcaacccagc 
accatccgct 
cgctacctgg 
agccggttgc 
ccgctccggt 
cgggtctggt 
gggacgaaga 
ceggaagact 
agagatgaag 
cggcgacctg 
gtggcgcggc 
caataagcag 
atactcgagg 
cgctaatacg 
ategaggecg 
gaggggaagc 
taccagggtg 
aacctggcgc 
actgggatgt 
gcgagttcga 
gagaggttgt 
ctccggacga 
ccgcatcggc 
tgtcggagcg 
caactccgat 
cacccacacc 



cccccatgcc 
ccatcgccgg 
caccacaaac 



13080 
13140 
13200 
13260 
13320 
13380 
13440 
13500 
13560 
13620 
13680 
13740 
13800 
13860 
13920 
13980 
14040 
14100 
14160 
14220 
14280 
14340 
14400 
14460 
14520 
14580 
14640 
14700 
14760 
14820 
14880 
14940 
15000 
15060 
15120 
15180 
15240 
15300 
15360 
15420 
15480 
15540 
15600 
15660 
15720 
15780 
15840 
15900 
15960 
16020 
16080 
16140 
16200 
16260 
16320 
16380 
16440 
16500 
16560 
16620 
16680 
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gccaaccgga 
acatcaaccc 
cccgcccgct 
ccaacactcc 
cgggaaggta 
cgcgcagctc 
tctggctccg 
caactccggt 
ggcgcaacct 
gccggatctc 
gaaggtgaag 
ctggcgacat 
gtacgagctg 
cgtcgtcggt 
gttggctcag 
cctcgccgat 
agagctgtcg 
agtgctgccg 
tttcatcgcc 
cttcttcgac 
aagtgtctca 
cggttaccaa 
acccaatgtc 
ggtcgtggtc 
gctcgaccct 
cgagagggct 
gcaaccgctg 
gatttggtac 
ctttccgagg 
caaggcgtgt 
gatgacgccg 
cgctaccgac 
gagttcgacc 
cccgtcatcg 
ttggcgattg 
taccagagcg 
gccgcagcgc 
gccggcgccg 
cgtcatgagt 
gccttcggct 
attgtgacga 
ccggtacccg 
gctaccagcg 
gtccggctca 
ggcaccctga 
cacagcctgg 
gccgagcgct 
acggccaaac 
acggtagccc 
ccggtcgtaa 
atgctgctgt 
tcggctgatt 
ggcgcccgac 
ggaacacgta 
agcgcagcgg 
gtcagcggaa 
ctggtcagtg 
aacgcggcgg 
tatgcattcg 
agtcaggcta 
acggccgctg 



gcgccgcagc 
cggcgcaccg 
ccgtccagac 
cgacgtgcgc 
gcaactggtc 
gcccccggaa 
cccacccgcc 
cggcgtgccg 
gattcaatta 
gacgcgacac 
ccccagaaac 
tgggtgcatg 
gacctgcacg 
ctcaaaggtg 
gtgcgggccg 
cgggtagggc 
cactacaacg 
gcaccggaat 
gatcctgcgt 
ccgctgaccc 
atcgacggcg 
gatttggcga 
gcagttaaag 
atgccgtggg 
atctacaagc 
ggacgtcgtt 
cgcggcctgc 
tgccagcggc 
tgttggaaga 
gggcgttcgc 
gggtggtcga 
cgttggtcga 
gcacggcatt 
ggatggcgat 
gcatcctggg 
gccacctggc 
tggccgtgcc 
ctacggccgt 
tggcgtcgtt 
atggatacca 
atgcggccaa 
gcgaaaccgt 
aagaaacccc 
ccgagcgcag 
ttctggctgc 
tggtcgcggg 
ggtgtgcgtg 
tcatcatctg 
tggttgcgct 
aacgaactct 
ggatcaccgg 
ggcggttcct 
aaattgctgc 
taggagatcc 
ccgcgaaatfc 
cggattcggt 
acgggctgcc 
cggacgtcta 
gctcgtcggg 
cccagctgct 
agctggcacc 



aaccggaatc 
caccagcacc 
cgtctgcgtc 
gccggggtca 
catccatcca 
cggagccctc 
ccgcgccgac 
agcgacgcgt 
cggccgcaac 
agaaatcctt 
cgaaggccac 
cgttgacgcg 
ctcgagtccg 
gggctggcaa 
accggatcct 
gacaatcggg 
acatccgcgc 
acagctcggc 
cgaggtttta 
gcggcgtgct 
cacaacaggc 
gccgcgcatg 
acctggtgcg 
acaggcacat 
gcaaggtcct 
gagcgcacct 
caccacccgg 
ggtgccgatg 
cacgccggct 
tcgtcccgga 
cgggtcactg 
ggatgtcatc 
gaatcgcttt 
gcgggcgtgg 
gatcgctgtg 
cgagtgccta 
gttgccgcgc 
gctgtttttg 
tgccgtgatc 
ggactgggtc 
gctgaccgtc 
ggacaacgag 
gacctggcag 
caaactggcc 
cggtgccatc 
tttgatcacg 
ggcgttgctg 
gtacccgcac 
cgtggtggtc 
ggaattgatc 
ggtgtacgac 
gacagaacat 
gattgagcgt 
ggcaatggct 
ggccggcctc 
ggtagcagca 
cggcgtgaaa 
tgcgaagacc 
cgaaggcctg 
gagcacaccc 
ccgtgttgtt 



accggcgccc 
gccctgggca 
cccggccgaa 
ccgctatcgc 
ggcgcggctg 
gccagcgccg 
agaacctccc 
ccaccccgat 
cactggcggt 
aaggccggcg 
gaagccgccc 
aatcaacctg 
ccgcaatccc 
aaccacgctg 
ggctctagac 
cgcgaccatc 
acacactagc 



gcagcgcgcg 
caacctcgtc 
gtccacggtg 
gtcggtcgcg 
cgtggtcatc 
gcatttcgaa 
tgcggccgga 
cgaattggcc 
gctgttgctg 
gtgacgatcc 
gaaacttata 
gatgtactcg 
tcgccgccgc 
ctgactctgg 
gacgcgatcg 
gtgggggcgg 
tgggaaactg 

ctggtaggca 
ctggtcacga 
ggggtcaact 
accttgatga 
accgctatcg 
cccgcggggg 
gcggtcgcgc 
gagttgctcg 
gccatcatcg 
aagcaacttc 
gcggtcgtgg 
accgtctgcg 
gcggcgacgg 
tatgcctggc 
gggtcgatgg 
gacggcgcca 
acggtccgca 
cgaggacacg 
gtggcgcgtc 
gaaccgttgg 
gtttttccgc 
atcaacgaga 
gccgccctga 
gatcagtcac 
gctggcgtcg 
gtgtcacagg 
gcgacggtgc 



cacgtaccct 
aagatgccaa 
ccaccgaccc 
acagacaccg 
cgggcagagg 
ttgggccaac 
cccagcccct 
ttagccgccc 
cgtcgccgca 
gccaaggggc 
aaagtggtgt 
ggcctgtcac 
cgcgggtcgt 
acagcagcgt 
gcggatccag 
gctgatgtgc 
gtcaatgcgg 
ctcagcgacg 
ttggctgatt 
tccggtgtcg 
ttggactggt 
aatcacatca 
cagcaagttc 
accgagattt 
gcagcgctat 
ctggtcctac 
tgaccggcag 
ttgacgacac 
gcggcttcga 
tgaagctcga 
tgtcagtcag 
ccgtgcttga 
cgatcccgct 
ggcgtagctt 
gcttcgtcgc 
cgtatctgct 
cgttgggggc 
cgcggggcgg 
cggtcatcgc 
ggatcgcatt 
ggatcgcgct 
atcccgtcgc 
cgtcggtgcc 
tgatcggata 
tgcgcgggca 
gatttcgctc 
tcgcgattcc 
tgttgttgag 
ctcacgtccg 
tgatcgctgc 
atatccggtt 
gcgcaggttt 
cggtaaaatt 
ccgtcgatcc 
agcctccggc 
ccatgccaag 
ctcgaacagc 
tgggaaccag 
cctcggtcgg 
tcacgaccca 
cgcaactcgt 



cgcacgggcc 
tcggcgaacc 
ggcctgcccc 
aacgaaacgt 
aagcatccgg 
cgagatcgta 
cgccgcagcg 
aacatgccgc 
agcgtgcagc 
cgaaggtgaa 
cgcagcgcgg 
ccgacgagaa 
atcagatcgc 
tggggtcgac 
gcgccggaaa 
ttgcagaaaa 
tcaatctgga 
ccgactggca 
gtggggccgg 
tggtcgtggc 
tgcgcaacaa 
tgccgggaga 
aacccggccg 
cactcgactt 
ccgacgattt 
cgccgcgggg 
acggatgacc 
cgtcgcggtg 
ctttaccgcg 
ccagtcactc 
tcgcaccgag 
cgagtcacct 
tttgaccgcg 
gtggtggccg 
gaacaggttc 
gatcgcaacc 
gccacaagtt 
ccctcggaag 
ggccgccgct 
cgggctgttc 
gccgccgatt 
gaccccggag 
cgcgtccgcg 
cgtcacgtcg 
cttctttgta 
gcggctttac 
gacgggtctg 
cgtctacctc 
gcgcgtttca 
catcattccc 



ctgagccgga 
gcataccttc 
tgctcgatgg 
caccggcttg 
gccgatcgcg 
catcgaatcg 
atccaacatg 
tttgagccag 
tggtcagcca 
gctcggcgag 
tcagctggct 



16740 
16800 
16860 
16920 
16980 
17040 
17100 
17160 
17220 
17280 
17340 
17400 
17460 
17520 
17580 
17640 
17700 
17760 
17820 
17880 
17940 
18000 
18060 
18120 
18180 
18240 
18300 
18360 
18420 
18480 
18540 
18600 
18660 
18720 
18780 
18840 
18900 
18960 
19020 
19080 
19140 
19200 
19260 
19320 
19380 
19440 
19500 
19560 
19620 
19680 
19740 
19800 
19860 
19920 
19980 
20040 
20100 
20160 
20220 
20280 
20340 
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ccgcacgccg ttcagatgtc gcaaaacgca tcccccatcg ctcagacgat cagtcaaacc 2 0400 
gcccaacagg ccgcccagag cgcgcagggc ggcagcggcc caatgcccgc acagcttgcc 20460 
agcgctgaaa aaccggccac cgagcaagcg gagccggtcc acgaagtgac aaacgacgat 2 0520 
cagggcgacc agggcgacgt gcagccggcc gaggtcgttg ccgcggcacg tgacgaaggc 20580 
gccggcgcat caccgggcca gcagcccggc gggggcgttc ccgcgcaagc catggatacc 20640 
ggagccggtg cccgcccagc ggcgagtccg ctggcggccc ccgtcgatcc gtcgactccg 2 0700 
gcaccctcaa caaccacaac gttgtagacc gggcctgcca gcggctccgt ctcgcacgca 2 0760 
gcgcctgttg ctgtcctggc ctcgtcagca tgcggcggcc agggcccggt cgagcaaccc 20820 
ggtgacgtat tgccagtaca gccagtccgc gacggccaca cgctggacgg ccgcgtcagt 2 0880 
cgcagtgtgc gcttggtgca gggcaatctc ctgtgagtgg gcagcgtagg cccggaacgc 2 0940 
ccgcagatga gcggcctcgc ggccggtagc ggtgctggtc atgggcttca tcagctcgaa 21000 
ccacagcatg tgccgctcat cgcccggtgg attgacatcc accggcgccg gcggcaacaa 21060 
gtcgagcaaa cgctgatcgg tagtgtcggc cagctgagcc gccgccgagg ggtcgacgac 21120 
ctccagccgc gaccggcccg tcattttgcc gctctccgga atgtcatctg gctccagcac 21180 
aatcttggcc acaccgggat ccgaactggc caactgctcc gcggtaccga tcaccgcccg 21240 
cagcgtcatg tcgtggaaag ccgcccaggc ttgcacggcc aaaaccgggt aggtggcaca 213 00 
gcgtgcaatt tcgtcaaccg ggattgcgtg atccgcgctg gccaagtaca ccttattcgg 21360 
caattccatc ccgtcgggta tgtaggccag cccatagctg ttggccacga cgatggaacc 21420 
gtcggtggtc accgcggtga tccagaagaa cccgtagtcg cccgcgttgt tgtcggacgc 214 80 
gttgagcgcc gccgcgatgc gtcgcgccaa ccgcagcgca tcaccgcggc cacgctggcg 21540 
ggcgctggca gctgcagtgg cggcgtcgcg tgccgcccga gccgccgaca ccgggatcat 21600 
cgacaccggc gtaccgtcat ctgcagactc gctgcgatcg ggtttgtcga tgtgatcggt 21660 
cgacggcggg cgggcaggag gtgccgtccg cgccgaggcc gcccgcgtgc tcggtgccgc 21720 
cgccttgtcc gaggtagcca ccggcgcccg cccagtggca gcatgcgacc ccgcgcccga 21780 
ggccgcggcc gtacccacgc tcgaacgcgc gcccgctccc acggcggtac cgctcggcgc 21840 
ggcggccgcc gcccgtgcgc ccgggacacc ggacgccgca gccggcgtca ccgacgcggc 21900 
ggattcgtcc gcatgggcag gccccgactg cgtccccccg cccgcatgct ggcccggcac 21960 
accaggttgc tccgccaacg ccgcgggttt gacgtgcggc gccggctcgc cccctggggt 22020 
gcccggtgtt gctggaccag acggaccggg agtggccggt gtaaccggct ggggcccagg 22080 
cgatggcgcc ggtgccggag ccggctgcgg gtgtggagcg ggagctgggg taacgggcgt 22140 
ggccggggtt gccggtgtgg ccggggcgac cgggggggtg accggcgtga tcggggttgg 22200 
ctcgcctggt gtgcccggtt tgaccggggt caccggggtg accggcttgc ccggggtcac 22260 
cggcgtgacg ggagtgccgg gcgttggtgt gatcggagtt accggcgctc ccgggatggg 22320 
tgtgattggg gttcccgggg tgatcggggt tcccggggtg atcggggttc ccggtgtgcc 22380 
cggtgtgccc ggggatggca cgaccagggt aggcacgtct gggggtggcg gcgacttctg 22440 
ctgaagcaaa tcctcgagtg cgttcttcgg aggtttccaa ttcttggatt ccagcacccg 22500 
ctcagcggtc tcggcgacca gactgacatt ggccccatgc gtcgccgtga ccaatgaatt 22560 
gatggcggta tggcgctcat cagcatccag gctagggtca ttctccagga tatcgatctc 22620 
ccgttgagcg ccatccacat tattgccgat atcggattta gcttgctcaa tcaacccggc 22680 
aatatgcctg tgccaggtaa tcaccgtggc gagataatcc tgcagcgtca tcaattgatt 22740 
gatgtttgca cccagggcgc cgttggcagc attggcggcg ccgccggacc ataggccgcc 22800 
ttcgaagacg tggcctttct gctggcggca ggtgtccaat acatcggtga ccctttgcaa 22860 
aacctggcta tattcctggg cccggtcata gaaagtgtct tcatcggctt ccacccagcc 22920 
gcccggatcc agcatctgtc tggcatagct gcccgtcggc ctggtaatac tcatccccta 22980 
ctgccctccc caaaccgcca gatcgcctcg cggatcaccg tccggttggc ctccggcatt 23040 
tcacgccggc tcggccgctg gatccacccc gcgccggtat tcgcagtaac ccgttgaatc 23100 
cgcgcgcatg atgcaccgct tgggcgatca gccgggtggt cacctcgctt gcgctggccg 23160 
cgctgtcgca cggggcgctc ggtggtaacg gacgtcataa ttaaccagcg taaccgaacc 23220 
taagaccagc tagctgcggc aatattggcg accaggacta tggcgccctc cgaacccggc 23280 
cgatccatgt caaaacattg acaatgcgta ctcacgccgt gtcgggcgcg ctgaatgacc 23340 
gcattgcggc gctcattcgg tgcgtagtcg ctaccaccgc aacaatgggc ttaggccatt 23400 
ccttcgttca tcgcgcggga catggccgat aacgcagcgg tcagctgctc gcccgccgcg 23460 
tcgttatacg cggacgccgc ggcctgcgca ttgtgcagcg cctcgttgac ccgctgagcp 23520 
accgcctcgg cacccagctt cttcagcaaa ccatcttcga tgcgcaggcc ggtgagccac 23580 
tggtgcccat tgatcgtcac ttcgacggtc tcggcttcgt cggtggcgcg gaaggatccg 23640 
ttgttcatct gattgagcgt cccgtctagg gccgactgaa accgcgccgc cagcgtcaac 23700 
gcccgggcga catgcgggtc caattcgtcc atgctcactt cgactcctta ctgtcctggc 23760 
gccgacggtt accaatgacg gcctcggtcc atgcccgatc ctcggtgtag agcgcctcgt 23820 
cttcctgctg agaacccttg gacttggcgc ccccttgtcc ctgatgcgcg gcacccatcg 23880 
gcattcccat gccaccgccg cccagcgcgg cgccgccgcc ggcccttccc tggcctaagc 23940 
cggcaatgtc accagcgcca gcgggccgca ccgattcggc gcccccgatc gcggatccca 24000 
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acggcgccga cggcaccccg ccgcctccac cgccaccgag cgatgccgct ttgaccgcca 24060 
cgtcgcccga cagcgctgcg gcttcccgcc cagccgacgt cagctgcgcc gccgtgtcag 24120 
ccgggaggcc accacccggc gatccggtag gcggaaccat cggtgcggct ggcatcccgg 24180 
taccgggagt cacaccggag ccgtcagacg gcggcatcag gaagccaggg atcaatccct 24240 
gctcttgcgg aggcgggggc gggtcgatct tgatggcggg gggaggcttc ggcgggttta 24300 
ccggttccag ggctgccttg ttgttgtatt cggtcagcac cttctccgac ctctgctgat 24360 
actccgcgta caccgggaga atttggtcgc gggccgaagg gttttccgcg taaagccgtt 24420 
cgagcccgac tatgtcttca taagtcggat gttcccgcct agcccacacg tgcagctgcg 24480 
cgacatattg agcctgcttg gccatcgcag cgctcaattt ggccatgtgg agtatccatt 2454 0 
gccgttgttg atcgagcgaa gcctcgcaag cggtagccgc atcgccttcc cagttgtcaa 24600 
acccccggaa ccgcttgacg tcgccttgca gcgtcaggtt gaaagtgttc cacccatccg 24660 
caaagtgcgc gagcgatgcg ccttggtcgc ccgtttcgag cttccttgcc gcttctttga 24720 
gatccatgaa gttgggttca ccggccgtgg ccaccctcgg cgtatcggtt agttcggccg 24780 
aactgtcccc tccgacggcc ccggccgatt ctgcctgcac agttccttcg ccgtcgttgt 24840 
ccagcgcggt cgcagcctcc tcatcaacct cgccatacgc cttggccgcg ttgcgcagcg 24900 
aggtcgccag acgctgccgc tctttggcac cggccgccag gtattcccgc atgttgtcgg 24960 
cggacaatac cagctgttgg gcggcgtttt tagccgccgt gagttcgcac ggtgtgatgg 25020 
ggacatcagt cggtgggtcc gccatcgggg cctccacctc gttggccctg ttcaaaatct 25080 
cttgctgatc caccgtcacg gtctgcgact gcgtcatatc ggatcatcct ccttagtgct 25140 
atagccatta tcgtcgctaa actgaaaggt tcctgcacta atttgatgcc gcccgttcat 25200 
gccggcatcg cgaacggatc gccctacttc ggcagcgcca tctggtagcg gctttcctcg 25260 
ggtggggaaa cccggcgaat cggcagctgc cgatgccgcg gggtaccgat cacattgtgc 25320 
cgcagaatca cccggtcaat accgggatgc gggccgagat aggtcgtcgc attcggccac 253 80 
gccaccttta cctcctgccc gatgtgtgcg ccgatcaacc gggcaaattc ctcgaactgt 25440 
ggcccgactg tgaccatcgc acctgccgcc gccgcacgca ccacgaactg ggtgaatgtc 25500 
tgagcgtcac ccaggttgag ggcgatgtcg acatcgtcga agggcatgta gaccgggcat 25560 
cggttcaccg tctcgccgac cagtacccca gctgacccga tcggcagctg gcagtggcgg 25620 
ttggccacca gatgctggcc ttgcagcgcg ggccgctgcc cgccaaatag gcgggcgaag 25680 
cccctgggtg tcttgggctt gtccgccgtg gtcagcaaca ccgtggactg cggggccatc 2574 0 
cccggcgcga cccggactct ggtgatggtg tggtccgcgc gcgccgacca ccatacatcc 25800 
ggacctccgg gcgccgcgta ggcggcagtg taggcatcgc gccccttgat catcgaccat 25860 
ttctcccgca caaagccgat gtcggtggcg tggtcgtagt catcgaagct gcggccacac 25920 
accgcgtcga caccatggct agccagtcga tcggcaatgc gcgtcgcgga cgccaccaaa 25980 
taccgggcca gtcctgcgac gccttcatcg cggcgctgcg ccgatttgcg ggtgcgttcc 26040 
gggtcggcgc gcagcacgat ccaggtccgg cggttcgccg gcgccgggtc tgtcccgatc 26100 
acctgctgat acagactcac cacgtccggc gctgcggtat tgccgacgcg gtagccggct 26160 
gagacgatat cggcctccaa gtcgggacag tgcaccgaca ggagctcctc caccagtccg 26220 
gtgtccagca tgtcgtcggt gtgggcttgc ccgtcgacga tgaccgtcgg cgtgaatggt 2 6280 
cggggaatga gctcgattac ggcgaccaga aactcgcctt gccagcgcac cgcaacgtga 2634 0 
tctcctggct tcacggtggc cccgaccaca ggttctgacg aggaatccgg gggccgtcgg 26400 
cgccgccgca accacgcgta caccgccgcc acccagccgg tgatccggcg gccgtagaaa 26460 
gtgaccgtgg ccacgatgac gcccaacgag gccagcgcaa tccccgccca ccagtagcgc 26520 
gtctccaaga atgcgatgat gcatggcggg gccaacgcgg aggcaagcaa ggcgtgcccg 26580 
gtgctgaacc gcagccctaa aggatttctc atcggcggct cagcgcccgt ctagccagcg 2664 0 
cgcccaggcc cagggccaac gtaaggccga cggccaccaa cgccacagcc gtaatcgggc 26700 
gacgatcggg acccggctcc accaccgggg gtggaagtcg tctgacgttg tatggcgccg 26760 
aagcagggcc gggcggaatg tcccacgtca gcgcggccac cgcatcgatg acgccggcgc 26820 
cgaccaggtc gtcgaccccg cccccggggt gtctcgcggt ggcggtgatc cggtggatga 26880 
tctgcgccgg cgtcaggtcg gggaaccgct gccgaagcag ggccgccaga cccgacacat 26940 
atgccgcggc aaacgaggtg ccggcgatgg gtaccggccc ctcccggcct tgcagcgcat 27000 
tcaccggttc accggtgtcg ccgagcgcga cgatgttttc tgcgggcgcg gccacgtcca 27060 
cccacggtcc gtgcatcgag aacgagctgg gcatcccggt ctggccgata ccgccgacgc 27120 
ttaacaccag cggtgcgtac cacgccgggg tgacaacggt ctgcacattg ttccagccgc 27180 
gtgggtcgcc gggtgtggac gggtccggcg ccggattctg tacgcaatcg ccaccggtgt 27240 
tgccggccgc gaccaccacc accacgcctt tgacgttgac cgcatagtcg atggatgcac 27300 
ccagtgaggt ttcatcgatc ggcctgctca ccttgtagca ggcggcttca ctgatgttga 27360 
tcacacccac gccgaggttg gcggcgtgca ccacggcgcg ggcaagactg cggatggaac 27420 
cggcggccgg ggtggcgttg gggtcattcg ggttggcttg tgagccgacc ggttcgaagg 27480 
cctcagacgt ctgacgtagc gagagcagtc gagcgtcggg cgcgacgccg acgaacccgt 27540 
cggtgggcgc gggccggccc gcgatgatgg atgctgtgag agtcccatgg gcatcacagt 27600 
cagacaggcc gttaccggcc tggtcgacga aatcgccgcc aggttccgcc gggacccgtg 27660 
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gcgaagcgtc gacaccggtg tcgatcaccg ccaccgtcac cccggccccg gtcgcgaact 27720 
?gtgggcatc ggccacgccc agatacgtgt tgctccacgg cggatcgtgg aacccggacc 27780 
ccggcagcgt ggtgggcgac gcgcacaaaa cgcgctgttc ggtaggctga tccgggcccg 27840 
tcacgtcggg cggcaacgcg cccggatcga tcggcggtgg cgtgatggcc gatgcgggcg 27900 
acgcggtS caacgccagc gccaccgtga tcagaaagat acggtgcact cccagaacac 27960 
tcLtLgt? gagattcatt gcgattcatt gagctgcgtt gctaccttgg gccacttgac 28020 
ggacctgtgt gcattttaga cgtaacggct gggcaaacaa cgctgtcacg cctgggctgg 28080 
Iccgccgcgc cgaccagggc gcgtaggcgc tgtacctgga ccacgccggg ^ctcaacggt 28140 
tttgctaccg cactagccga tatgcggctg otaccaaacg atcgcggcca ^tctcggtt 28200 
gtctgagcac acgctgcgta tcgcggcatc gatgtcggtg gcggtgatga tctgcagatc 28260 
ctgaaccgat accggttggc ccgcacgttt ttgcgcaacc acccgggtgt <==cggaaccc 28320 
ttcggcgcgt tcgatcacgt tgcgggcgaa ccgaccgttt tgcatagcgt <=^taccgtg 28380 
ctgcScacta ggggtggtgt agttacggat ggtggtgacc gcgtcgagga atacctcccg 28440 
tgcggcgtca tcgagctggc tggcgcgcgg tgtagcgtag cggtgtccaa tctcgacgat 28500 
cfccaccggc gaataagact cgaaccgcag ctttcggttg aaccggccag ccaaacccgg 28560 
gttcacggtg aggaattcat ccacctgatc ctcatagccg gccccgatga aacagaagtc 28620 
laatcgilgt g??tccaatt gaaccaggag ttgattgacc gcctccatgc cgatcatgtc 28680 
cggtgttccg tcttgatgac gttcgatcag cgagtagaac tcgtccatga aaatgattcg 28740 
cccgagtgac ttttcgatca gctcgttcgt cttgggtcct gactccccga tgtagtgccc 28800 
acagaagtcc gatcggcgaa cttctcgaat ttcggggtga cgcacgatcc ccatgccggc 28860 
gtaStcttg ccgagcgctt cagcggtggt tgtcttacct gtgcctggtg gccccaccag 28920 
caacatgtgg ttggtctgcc cctccaccgg taggccgtgc tctaggcgca tcatgcgcac 28980 
ctcgag?tS tc??ccagcg ccgataccgc ttgcttgacc gccgccaggc ccacctgttt 29040 
ggclagcagt tcccggccct cggctagcag ctcgccgcgc cgctgcgctg <=a«gtcgtc 29100 
Itcgagctgg tcgcggcttt tcgccgtcga agcatcccaa cggtcggagc ggctggcgat 29160 
ggtfcgttca tcggSLcaa tcaagcgcag gttcgggtcc gccagggctt f "ggcggc 29220 
i?cgg?gagc accccgttga tggtggcctt cgacagccag atctgggcct t9*cctcctc "280 
atgcagttgc cggtacacca tcccccgcac atacgccaag tcggcgacca gcagcggaat 29340 
atcggccggt c^atcgccg cggtgagcac gtcggcgccg aaccgctccg atgacctgct 29400 
gtg??cga?c acgtccaccc ggtccagcca gtccagggcc actcgcccct ^ccgagatg 29460 
ggcggcggcg tgggctgcca gcgcacaaat cgacgcggtc accgccggca tgacgatcgc 2 9520 
SSSSgc agatcctcgg cggccgtcga caacacgtcg ggccatcgct gcgtgacgta 29580 
caSSgaac gcccgagcca gSSgatgcca ctggtagttg cgccacgaat ccaatagctc 29640 
gcggtt^gct Lcagggcat cggccttcgc atactccccc gcgatcgtca -cgccgacga 29700 
calcgccagc cccacctgag atgcgtcggt caccgtgatc ccgatggatg ^cccagctg 29760 
gacc?cagcg gccaacgtcc ggccgatccg cgtggtctcg cggtgcagcc ^^cgctatg 29820 
ggcgttgagc tgcttaagcg aggccagatc gcggtcaccg caggcgatac gacccagcca 29880 
cgcgtcggSc atcgacggat cggcctcggt ggcagccaca aactcaggca acgccgccac 29940 
glatccltgg ccattcttga tcgtcatcgc ccgatcgaaa tgccggcgcg cagtgagtaa 30000 
atcacccatc gtgtccacca ttctcgacat cgccgccgct gtcaccgcgg "gcaacgtg 30060 
tgtctgtcac tctgtgcctc aaattccgtt ggcaacgttc taccggccta tcgacatcgt 30120 
gaccggctca aggctgacat agcggttctc cgcacggaac atttccatct caaccagcca 30180 
gttt?Itcct glcgcaccga ctttcaccgt tgcccgatcg atttgttcga tggtcacctc 30240 
gaagccatgc cgalcgctct cggacagcga ggtaccgggt cgggcaatgg tgatgacact 30300 
ggc?ggccgt ggcgtgggcg aaatcgcgac atcgacaccg ctgccttcag atttgccgtc 30360 
atcgccgtlc tlgcgccgcc gcacgtactc cacgacgccg acagtggtgc gcggcgcggg 30420 
Tolllallls ccgacga?gc Laactgcgg catgcgtacg ctggcccaac gctcttggtc 30480 
gcSSgSc acacalaccc gctcaccggc accgacgacg cgaatcacga tcctcttggc 30540 
ga?cgtgt?g tccgcggcca cgaagacgcg cgacagctca ccggcgtcgg t^acgggaat 30600 
catcaglcgg tccccgttgc tcagcttgcc aatcaacacc cccgacggtc cgatctcggt 30660 
gactSctg gccggcaacg ggcagcgccg ctgtccgcgt aggtgtggac Jtggeccgc. 30720 
catcttggcc gcagccgcgg cggcttgctc accattgagc cgacgcaaga tcacactggg 30780 
cggggtSgc gccggcgtS g?gtgcgcac ggtgatggtc gcggtgcacg tcgcgtccgg 30840 
aKSccSt acg??c?gga Igacctcatc ggcacgcagc gtccaggctt gcgagagaac 30900 
ccgcgacgaa at?gcctcag ccgggtacgc atacgtcgtc atccacccgg <*tcaccgcg 30960 
ga?agctttccagcgctgcg cactcccggc taccgcgtcc gaccccagcc ggcgatcaag 31020 
^cSccaag tctgltglgg tggccagttt ggcgcgcaag ccctgacagc 9-gggagct 31080 
ggcaacgcgt tgggcgaccg aaatggcagc ggccccaacg ctggtacgcc agcgtaaagc 31140 
^gggtgtlg clgltcaccg gaagccgcat gatcagccac gtttcgcgcc 9^cggcata 31200 
cgSggcgta ccgatctccg cgtcatacac ccgcgggtaa tcgccgacgg tgccggttcg 31260 
cgagccgaag gtgacgacgc tgattgaatc gagttccagg tccagcgggt ggcgcagcaa 31320 
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=99=909.9= «==..=ga=g= =aat=.=gtt gtogotttc* a=ggtca==g ««||«a= 31380 

SSSS S5S2S SSSS S | ^ 

as == « ilS ESS 5s= ss 

cgcccagccc gccgacgcga gcacgaacac tgtccacacc ^cggcgac » » 1740 

gigcgggctg aacccggtca gcttggacgt caacgcgccc tccgtagccg *9^gatt 317 

gcca??gcca gcacaccggt ggccactgcg ccgacgaacc cgatagcgat attgcgcgcc 31800 
cggtgatc 



<210> 2 
<211> 13773 
<212> DNA 

<213> mycobacterium tuberculosis 

<223> Complete DNA sequence of RD1 Rv3867-3877 

<400> 2 t u „ ^ rt2 pnrrrt- caatttctcc 60 

120 



ililii 

sss lillg liHS SIS! S sSss 
IIII SS ss# sra « SIS 

aaacgccagg attgccatcg gggggctgta cggcgatatc ^cctacccgg tea a 

SSSSS =S5S « irs iss ~ 

5S2S 23=3 SESK S £ ? g,™ 
SSSSS K3SS SS=g = ~ -235= 

gggcaacgaa agcgccgcgg tggegctget g-tggtta cag ^ 
caaagtggct gcggcgctga aggatccccc « aa » a tg accgaca actccggccg 
gatcgcatcc cgcgccgatc cctgggatcc gggcagtgtc 9^9 a ^9 tcacccqqqt 
ggagcggctg c?cgccgagg cccaagccga actcgaccgc caaattgggc tcacccgggt 
taaaaatcag attgaacget accgcgcggc 9acgctgatg jcccgggtc 9^9 | 
gggtatgaag gtcgcccagc ccagcaagca ^tgatcttc acegg g fccattgccga 
caagaccacg ategegeggg tggtggcoaa tatcctggcc gg » ggcaategge 
acccaaactc gtcgagacgt cgcgcaagga ct ^gtcgcc ^^gagg 99 174Q 
ggtcaagacc gctaagacga tcgatcaggc gctgggcggg 9tgcttttca teg | || 18Q0 
??atgcgctg gtgcaggaaa gagaeggecg =accgatccg "eggtcaag aggeg gg 
cacgctgctg gcgcggatgg agaacgaccg 99accggctg 9^9tga^a teg ^gg ^ 
cagctccgac atagategge tgctggaaac caacgagggt ctgcggtcgc gg | 198Q 
tegcatcgag ttcgacacct attcccccga 99"°*"*° jagatcgeca a g ^ ^ 
egctgetgat gatteggegt tgacegcaga ggeggecgag aactttcttc agg J 
gcag?tggag cagegcatgt tgcgcggcj 9cgcgccctg 9acgtcgccg geaaegg g ^ 
gtatgegege cagctggtgg aggecagega 9caatgccgg ga 99^ tatggc 2220 

cctcgatatc gacaccctcg acgaagaccg 9 ctt °fcgag a »* tggggcttcg 

ggaggctatc gccgcggtgc acgcacacct caacatgaga ^faaeta tgggg^ 9 
cctcaccacc aaggttcagg ttagcggctg ^tttctg ctgcgccg^ 9^^ 

SSS5 a?c 9 gtc9t C cg SgSctglt SS3-S 9ccgcgc t gc tggegtaett 



180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 



2280 
2340 
2400 
2460 
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caaaccacaa 
ttacgtgctg 
gctgggcaat 
gggccagacc 
ctcgatctgg 
gaccgcggtc 
cgaagcggtg 
cgccatagat 
gccaaccccg 
gccgccgata 
cggatcggtc 
cggcatcgcg 
gctggtcgcg 
cccctcaccg 
gtgctggtca 
ccggcatctg 
ggcgaccgtt 
caccgaatcg 
gacagccaag 
cctggtcgac 
cgctgaccct 
aagaagttca 
acgccgcccg 
tacgtgatgg 
cagctgtcgc 
ctggccggta 
gagtacctgc 
gtggcgttct 
caacggcagt 
ggtgaccagc 
gccagcgcag 
ctacgaaccc 
actatcgcga 
cacctagccg 
gacgaccccg 
gatgcggccg 
gcgcgcgggc 
ctgaccggcg 
ttgggcaacc 
gaccggctcc 
cagcaagcca 
gacaagacgt 
caaagtgtcg 
cggctaaaga 
aaagagggcg 
gggaagtccg 
caggtgaatc 
cttccgcaca 
atgggcgagg 
atgaaagtcg 
cgcggtgccg 
ctgttgcaga 
tcgctgaggg 
gacaaactgg 
tccaaggcgg 
gggtttctcc 
ccatacatgc 
cagaccacta 
gcgccgacac 
gccaacggcc 
actgctgaac 



ggcaaactcg 
ctgtccggac 
ccggccaacc 
gttggaatcc 
accctatgcg 
atcgcgatgc 
ctggtgtcct 
ctgaccgacc 
atctcggagg 



ccggcggcgg 
ttccagatcc 
caggtcaacg 
ccaccggcaa 
ctacccgatg 
tggcaacgca 
ccgatatcgc 
tacctcgacg 
atgtactaca 
tcgctgggcc 
ggtccggtgc 
agcccccgaa 
ctcccaccat 
atgacctggg 
gtggcgccat 
cgtacatgtt 
gcaccggtgg 
ggtatttggc 
tctcctacca 
ggtcccggcc 
cggcggtgga 
cacctcagcc 
atggattgat 
tcggcgggga 
tgttccaccc 
actggtcctg 
ggtccacccg 
cacacgcacc 
gcaaggctgg 
atcgcggctc 
ctaaccaatc 
gccgtatcgc 
cgcgggtcca 
aggagataac 
tcccgtttgg 
cggaattcgg 
aattcctgcg 
tcctgctcac 
ctgccgctgt 
tgttgaccgg 
gcgcggccgg 
acctaccccc 
gtcacccgga 
tccatctgct 
agccaaacct 
taatcggcac 

gggtcggcat 
cgccggcggc 
gacaagccgc 
cgtgacccgc 
cgcgccggcg 
cggaagtacg 



gcggcaccag 
agttgcatcc 
cggccaccgt 
ccggcgcccc 
acaccgtcgc 
cgttggagat 
accagggcga 
gcgccctcac 
gcatgttcaa 
gcgcgcccaa 
acaccgacaa 
cgacaaccgc 
tggtgcccag 
aaccgctcaa 
gcgccggcga 
cctcagcgat 
gcggaaaatt 
tcgatccaca 
tgagttcacc 
tgtcgaaaga 
aagttcccgc 
tacccgtggc 
catcgacatc 
gctcggcatg 
gatgatgccg 
tggcggcaag 
aggactacgc 
cgcaccgcat 
ggccaacgcc 
tcgattattg 
gttcctggag 
ccatgactgc 
cttggcgggg 
accggacctg 
gctcaaatgg 
gctgatcttc 
cgattcgctt 
attcccgccc 
ggcctaccgc 
gtttcgccag 
gcgaaagttg 
gaagaaggtg 
accttcccgc 
tcatgaacta 
cgccggaccg 
caccctgatc 
cgacttcaaa 
cgtcaccaac 
agaactcgat 
agccctgtcc 
gctgccaacg 
cttcatcggg 
gctggctacc 
gacatatcga 
accggaggcg 
ggaagacccg 
aggcgtcgaa 
gcgcattcac 
gccggcgacg 
acgatgcaaa 
gacgctgcgc 



cctgttcacc 
ggtctacaac 
gaagtcctcc 
ctacgccacg 
ccgagccgac 
cgatgcttcg 
aacctggatc 
ctcgtcgatg 
cgcgctgcct 
ttcgcttggc 
gggcccgcaa 
tgcggcgctg 
tctggtcgtc 
gatcgtgtcc 
ccagtcgccg 
gaacatgggg 
cgtggcactg 
gggcgtgcgt 
ccaaaacgcg 
tgccgcactg 
cggagcctcc 
ccccggttga 
ccaccgtcgg 
atcgccatca 
ctgatgatga 
aaggtgcccg 
acccgagtga 
cccgaggatc 
gacttctatg 
aagccggccg 
ccggtcagtc 
ccgaaactgc 
gcagccggcc 
ctgcagatcc 
cttccgcacg 
acgcgccagg 
cccggcggcc 
gacggtaggg 
atcagggtgc 
gtgacatcgg 
gccggatggt 
gccaccgact 
tggaggatgt 
aagaccggca 
cacggcatgc 
ctgtcgctgg 
ggtggttcaa 
atggccgagg 
cggcgccagt 
ggcgtggccg 
cttttcgtcg 
ctgttcgacc 
cagtcgctgc 
atcgcattgc 
cagtacatca 
gtcaagttca 
accaatggtg 
aggttcaccg 
atgcaaagcg 
gcgcagcgat 
gaggttgtgc 



gaccgcgcga 
ctgacttcgg 
gaactgagca 
cctgtttcgg 
tccacttccc 



atcgatccgc 
gtcacaacta 
gggataccgg 
gatatggggc 
ctacctgatg 
tactatgtgg 
cgcgccaccc 
agaatcgccg 
cggccgcagg 
cagtcaacgg 
atcaagcaga 
caatcccccg 
tatggggtgc 
ccctgggaga 
ctcgagcacg 
ggagccccct 
ccccgggcga 
gcgtccaaaa 
tggtggccgg 
tcgtgatgat 
aaatcaacgc 
cgtcctcggc 
tgttgtcgat 
cggccacccg 
tcggcgggga 
atatgtgggt 
tgcaactccg 
tgatgacggc 
gggtgctcac 
tacagcacca 
aaggtctgtc 
cctacgtagt 
ccggtgtcac 
acgaggatgg 
tcaccgatcg 
ccatcacggg 
ggcaccagct 
acaccgacac 
acgtcatgta 
tcatcgggac 
tggcaatgac 
ccttcctggg 
aagccgagct 
cgatcctccg 
aatacgagaa 
tcgtcgacga 
ggatctgccg 
agaccggcgg 
gcaccaccag 
ccaacaagga 
gcaccttcta 
aagccggagg 
cggcaccggt 
cagcgatgag 
gaggaggagc 
tggaccagct 



ccaaccagct 
cgcggctggt 
agctgccgat 
cgggcagcac 
cggtagtgca 
tccagtcaca 
agggacgcca 
tgacggccag 
cctggcagct 
atctagtgat 
tgctgcccga 
aggcgcacgg 
aacgggtata 
atcccgcgct 
tgctgtccgg 
tccacgggac 
atcctcgata 
ctaacgcgga 
tcgttcgtct 
acacgctgcc 
gatgacgacc 
gatcagcctc 
gatccttccc 
cggcaccagg 
ggtcggcggt 
cgaccgcaag 
cacctctcag 
cgtcggcacc 
aatcggtatc 
gttggccgcc 
ggtcaagttt 
tacctttccg 
gatgatctgt 
cgaggaaccc 
gaccgaaacc 
ggacctggcc 
cgtcgtcgac 
ggtgatcacg 
gacggctgat 
gatgtcgccg 
caccatcctc 
ggtcggtgcg 
cgaccgtgac 
cctggacatc 
cacggggtct 
tcatccagat 
aatggaaaag 
cgtcagccgg 
acaggccggg 
gtaccgcgaa 
gttcgccgag 
cgtcgggcgg 
tgttcgcatc 
ctctcatgaa 
gagcggtgtc 
catcagtggg 
gcccggtcaa 
tctcgaggag 
gaggagcggc 
ggcgcgcatg 
cggcactgct 



2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4260 

4320 

4380 

4440 

4500 

4560 

4620 

4680 

4740 

4800 

4860 

4920 

4980 

5040 

5100 

5160 

5220 

5280 

5340 

5400 

5460 

5520 

5580 

5640 

5700 

5760 

5820 

5880 

5940 

6000 

6060 

6120 
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gaatcgcgtg 
ctcatcgccc 
cgccgccatc 
attgggggcg 
gccgccacac 
ctgatctatc 
aaggtcaacc 
aaggaacacc 
cccgttgcgt 
gtcggcgagt 
ttcggcgtcc 
gactacctcg 
cggattaccc 
catctgatga 
gcgatcaccg 
cgggtcctgc 
tccgactacc 
gctcactgcc 
aagacgacca 
gtgcggttca 
ctgctgggcg 
ctggcggtca 
tcgcgttcgt 
gtgggtgccg 
gcagatatcg 
atggacaagt 
gagaagcagg 
tttctcgtct 
gaagtgttcg 
cccgagctca 
ataaccgaac 
aaaaatgtca 
gcacggcgtg 
ggccgatgag 
ggcttccaat 
cgcccgcacc 
caacacatcg 
cgcacggctg 
gacgctttcg 
gggagaagcc 
ggtctggcta 
agccgcggca 
ccacatcacc 



cgtacaagat 
gtgatcggcg 
tacaggatgt 
cacctcaaac 



actcaccgcg 
tcgaaaacct 
gggtggtcgc 
gagtgggctc 
ccgatccata 
tccccgacct 
acgtcatcat 
gcaccaagat 
gcgagatccc 
tcggcgtgcc 
cgggggtgac 
cggagcgtat 
gcactcgctg 
acatgcacac 
ttgcccacgc 
tgctcgcgga 
ccggcgcgat 
acctgaagaa 
ggtggagcgg 
ccggggggat 
ggttgcacat 
tcgtcggcgc 
aattcccatc 



cgcgttgacc 
ggtctaccag 
gatccttgat 
tggcagctca 
gggtgagatg 
gttcagccag 
gggcctgctc 
cgcgggcgcg 
cacgccgctg 
tgctgccgga 
gggtgcgcaa 
ggagcgtgaa 
aatgacaaca 
ggtaaagaga 
caggaggcag 
gagtcgacgg 
gccgcggtgg 
tcgacgaata 



cgccagacgg 
cagcaccccc 
gcccggtaat 
ggtttgtgta 
catgatccga 
acggccggct 
gtctccgccc 
gcatcggccc 
tattcgcaaa 
gagggagtga 
atggccggcg 
gcggctctgg 
tggactggag 
caaaccgcgt 
tacacccagg 
caggccgtcc 
gagatggatt 
gccgagaccg 
cccggcgcga 
acaccggttg 
agcggcccga 
gtgggcggca 
ggcaccagtc 
ggcctgctgc 
atgtctcagc 
tcgtcggcga 
tccggcggct 
gaagacgacg 
gacttcccgg 
gaaagtagtc 
gtaatttcga 
caggttcgtt 
tgcgcttcca 
ttcgtcaggc 



gtggctgccg 
acaacccctg 
gtggggcgta 
cgggaagtcg 
caacgttcag 
tccacacgtc 
agagatgcaa 
gatcgggatg 
cggcgacgtc 
tgaggggcag 
ctccacgcca 
cgagttccgg 
ggcgaatcgt 
caggttcgac 
gcagatcgct 
ccacctgcac 
ggagattccg 
gaacccgcac 
gatcgcgcgc 
ctaccgctcg 
caaccgcaac 
gcggttgccg 
atttgacgtc 
gccgccgatg 
cattgtcacc 
cgcattcggg 
cagtgagttc 
caaagaggtc 
aagcgccggt 
cgagttcggg 
cgggatacaa 
tcgctgccga 
cgacggcgct 
aagcggcgac 
aagaccagct 
tcgacgacgg 
tcaccatgct 
cgggtccggc 
acgctcaggc 
gtggcagcga 
caacacaggc 
ccatggccac 
ttacggccac 
atttcatccg 
cggttaacac 
gccagagcac 
gccagttgcc 
tgcagcagct 
ccggcggcgg 
cgctgtcgaa 
gcgcggagtc 
tgatcgaaaa 
cgggtggcgc 
ccaccaggcc 
aggacgactg 
ccacccgggc 
cagcatggca 
gcggatctcc 
gcagggccag 
agaagcagcc 
cggcgtccaa 



ccgttgacca 
cgatttgccc 
gacgtttccg 
acgctactgc 
ttctattgca 
ggtggggtag 
gccgtcatgc 
taccggcagc 
tttctgatca 
gttcaagatc 
cgctggacag 
cttggtgacg 
ccgggtcggg 
ggcgtgcaca 
tcccagcaca 
gaactcgacc 
atcggcttgc 
ctactgatct 
gccatttgtg 
ggcctgctgg 
agcgcgtcgc 
ccgaccgacc 
gtgcttctgg 
gcaccgctgg 
tgtcagatga 
tcgggcgctc 
aaggtcaagc 
atccaggccc 
taagattatt 
caatgctgac 
atacagggag 
cattggcacg 
gacgtcggtg 
ggcgttcaca 
ccaccgtgcg 
cgccgccggc 
gtggcacgca 
tccaatgctt 
cgtcgagttg 
caaggcgctt 
caagacccgt 
gacgccgtcg 
caacttcttc 



tatgtggaac 
gcttttcgag 
gacgaacccg 
gccggcggct 
gacccagccg 
caacccagcc 
ccatccgctg 
gctacctggc 
gccggttgcc 
cgctccggtg 
gggtctggtc 
ggacgaagag 
cggaagactt 
gagatgaaga 
ggcgacctga 
tggcgcggcg 
aataagcaga 
tactcgaggg 



atccggtccc 
tggggatcat 
gggccggcgg 
agacgatggt 
tcgacctagg 
ccaatcggtc 
ggcaacggga 
tgcgtgacga 
tcgacggatg 
tggccgccca 
agctgaagtc 
tcaatgaaac 
cagtgtcgat 
gcgccgataa 
ccgaacaggc 
cgaacccgcc 
gcgagacgga 
tcggtgcggc 
cccgaaacag 
acgcggtgcc 
tagacgaggc 
tgacgacggc 
tcgacgattg 
ccccgttatt 
gccaggctta 
cgacaatgtt 
ggcgcccccc 
cctacatcga 
tcattgccgg 
catcgggttt 
ggaagaagta 
caagtgagcg 
accgggctgg 
tcggagggca 
ggcgaagcgg 
gtcttcgccg 
atgccaccgg 
gcggcggccg 
accgcgcgcc 
gcggctgcaa 
gcgatgcagg 
ctgccggaga 
ggtatcaaca 
caggcagccc 
aagctcgagc 
atcttcggaa 
acccagaccc 
ctgcagcagg 
gacgaggaag 
gctggtggat 
gcaggtgggt 
ccctcggtga 
ggtgcgggag 
gcgccggcac 
gacgactggt 
gccaacattt 
ccgatgccgc 
aaacccagat 
cggcggggac 
agcaggaact 
ccgacgagga 



gctcaacgag 
ggatgaaccg 
caacatcggt 
gatgtcggcc 
tggcggcggg 
cgagcccgac 
aaccaccttc 
tccaagtcaa 
gcccggtttt 
ggggctggcg 
gcgtgttcgc 
ccagatcgac 
ggaaaagcac 
cctggtggag 
acctccggtg 
gggaccagag 
cctgacgccg 
caaatcgggc 
tccccagcag 
ggacacccat 
cgttcaagca 
gcagctacgc 
gcacatgatc 
gccggcggcg 
caaggcaacc 
cctttcgggc 
tggccaggca 
gcctccagaa 
tgtagcagga 
gtttccggct 
ggcaaatgga 
acaacgctct 
ttcccgcggg 
tccaattgct 
tccaggacgt 
aataggcccc 
agctaaatac 
cgggatggca 
tgaactctct 
cgccgatggt 
cgacggcgca 
tcgccgccaa 
cgatcccgat 
tggcaatgga 
cgatggcgtc 
tgccctcccc 
tcggccaact 
tgacgtcgtt 
ccgcgcagat 
caggccccag 
cgttgacccg 
tgccggcggc 
cgatgggcca 
cgctcgcgca 
gagctcccgt 
tggcgaggaa 
taccctcgcg 
cgaccaggtg 
ggccgcccag 
cgacgagatc 
gcagcagcag 



6180 

6240 

6300 

6360 

6420 

6480 

6540 

6600 

6660 

6720 

6780 

6840 

6900 

6960 

7020 

7080 

7140 

7200 

7260 

7320 

7380 

7440 

7500 

7560 

7620 

7680 

7740 

7800 

7860 

7920 

7980 

8040 

8100 

8160 

8220 

8280 

8340 

8400 

8460 

8520 

8580 

8640 

8700 

8760 

8820 

8880 

8940 

9000 

9060 

9120 

9180 

9240 

9300 

9360 

9420 

9480 

9540 

9600 

9660 

9720 

9780 
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gcgctgtcct 
tgacagagca 
atgtcacgtc 
cggcctgggg 
cggctaccga 
aggcaatggc 
tcgcgtagaa 
tctcgtgttt 
gctcttccgg 
cgaccccagt 
ccagactccg 



gccaccccca 
gccctcgccg 
cgaaccggcc 
cccacccaaa 
atcccagttg 
accggaatca 
accagcaccg 
gtctgcgtcc 
ccggggtcac 
atccatccag 
ggagccctcg 
cgcgccgaca 
gcgacgcgtc 
ggccgcaacc 
gaaatcctta 
gaaggccacg 
gttgacgcga 
tcgagtccgc 
ggctggcaaa 
ccggatcctg 
acaatcgggc 
catccgcgca 
cagctcggcg 
gaggttttac 
cggcgtgctg 
acaacaggcg 
ccgcgcatgc 
cctggtgcgg 
caggcacatt 
caaggtcctc 
agcgcacctg 
accacccggg 
gtgccgatgg 
acgccggctg 
cgtcccggat 
gggtcactgc 
gatgtcatcg 
aatcgctttg 
cgggcgtggt 
atcgctgtgc 
gagtgcctac 
ttgccgcgcg 
ctgtttttga 
gccgtgatca 
gactgggtcc 
ctgaccgtcg 
gacaacgagg 
acctggcagg 
aaactggcca 
ggtgccatcg 



cgcaaatggg 
gcagtggaat 
cattcattcc 
cggtagcggt 
gctgaacaac 
ttcgaccgaa 
tagcgaaaca 
atacgtttga 
ccgcacgaag 
gcttcgtttc 
cccccgacgt 
cccccacctc 
gaaccggccg 
ccacccaaac 
ccacccacac 
gcgcccccca 
ccggcgcccc 
ccctgggcaa 
ccggccgaac 
cgctatcgca 
gcgcggctgc 
ccagcgccgt 
gaacctcccc 
caccccgatt 
actggcggtc 
aggccggcgg 
aagccgccca 
atcaacctgg 
cgcaatcccc 
accacgctga 
gctctagacg 
gcgaccatcg 
cacactagcg 
cagcgcgcgc 
aacctcgtct 
tccacggtgt 
tcggtcgcgt 
gtggtcatca 
catttcgaac 
gcggccggaa 
gaattggccg 
ctgttgctgc 
tgacgatcct 
aaacttatat 
atgtactcgg 
cgccgccgct 
tgactctggt 
acgcgatcgc 
tgggggcggc 
gggaaactgg 
tggtaggcag 
tggtcacgac 

gggtcaactc 

ccttgatgac 
ccgctatcgc 
ccgcgggggg 
cggtcgcgcg 
agttgctcga 
ccatcatcgc 
agcaacttct 
cggtcgtggt 



cttctgaccc 
ttcgcgggta 
ctccttgacg 
tcggaggcgt 
gcgctgcaga 
ggcaacgtca 
cgggatcggg 
gcgcactctg 
gtatggaagc 
cgccggcgcc 
ccgacgacct 
cgcctccgcc 
catctaaacc 



cacccacacc 
ctccgatgcc 
gaccaccgac 
acgtaccctc 
agatgccaat 
caccgacccg 
cagacaccga 
gggcagagga 
tgggccaacc 
ccagcccctc 
tagccgccca 
gtcgccgcaa 
ccaaggggcc 
aagtggtgtc 
gcctgtcacc 
gcgggtcgta 
cagcagcgtt 
cggatccagg 
ctgatgtgct 
tcaatgcggt 
tcagcgacgc 
tggctgattg 
ccggtgtcgt 
tggactggtt 
atcacatcat 
agcaagttca 
ccgagatttc 
cagcgctatc 
tggtcctacc 
gaccggcaga 
tgacgacacc 
cggcttcgac 
gaagctcgac 
gtcagtcagt 
cgtgcttgac 
gatcccgctt 
gcgtagcttg 
cttcgtcgcg 
gtatctgctg 
gttgggggcg 
gcggggcggc 
ggtcatcgcg 
gatcgcattc 
gatcgcgctg 
tcccgtcgcg 
gtcggtgccc 
gatcggatac 
gcgcgggcac 



gctaatacga 
tcgaggccgc 
aggggaagca 
accagggtgt 
acctggcgcg 
ctgggatgtt 
cgagttcgac 
agaggttgtc 
tccggacgat 
cgcatcggca 
gtcggagcgg 
aactccgatg 
acccacaccc 
ccccatgccc 
catcgccgga 
accacaaacg 
gcacgggcca 
cggcgaaccc 
gcctgccccc 
acgaaacgtc 
agcatccggc 
gagatcgtat 
gccgcagcgc 
acatgccgcg 
gcgtgcagcg 
gaaggtgaag 
gcagcgcggc 
cgacgagaag 
tcagatcgcc 
ggggtcgacg 
cgccggaaac 
tgcagaaaaa 
caatctggaa 
cgactggcat 
tggggccggc 
ggtcgtggca 
gcgcaacaac 
gccgggagaa 
acccggccgg 
actcgacttg 
cgacgatttc 
gccgcggggg 
cggatgaccg 
gtcgcggtgc 
tttaccgcgc 
cagtcactcg 
cgcaccgagc 
gagtcacctg 
ttgaccgcgc 
tggtggccgt 
aacaggttct 
atcgcaaccg 
ccacaagttg 
cctcggaagc 
gccgccgctg 
gggctgttca 
ccgccgattc 
accccggagg 
gcgtccgcgg 
gtcacgtcgg 
ttctttgtac 



aaagaaacgg 
ggcaagcgca 
gtccctgacc 
ccagcaaaaa 
gacgatcagc 
cgcatagggc 
cttccgtcgg 
atggcggccg 
atggcagcgc 
aacctaccga 
ttcgtgtcgg 
ccgatcgccg 
cccatgccca 
atcgccggac 
cctgcaccca 



ccaaccggag 
catcaacccc 
ccgcccgctc 
caacactccc 
gggaaggtag 
gcgcagctcg 
ctggctccgc 
aactccggtc 
gcgcaacctg 
ccggatctcg 
aaggtgaagc 
tggcgacatt 
tacgagctgg 
gtcgtcggtc 
ttggctcagg 
ctcgccgatc 
gagctgtcgc 
gtgctgccgg 
ttcatcgccg 
ttcttcgacc 
agtgtctcaa 
ggttaccaag 
cccaatgtcg 
gtcgtggtca 
ctcgacccta 
gagagggctg 
caaccgctgc 
atttggtact 
tttccgaggt 
aaggcgtgtg 
atgacgccgg 
gctaccgacc 
agttcgaccg 
ccgtcatcgg 
tggcgattgg 
accagagcgg 
ccgcagcgct 
ccggcgccgc 
gtcatgagtt 
ccttcggcta 
ttgtgacgaa 
cggtacccgg 
ctaccagcga 
tccggctcac 
gcaccctgat 
acagcctggt 



agcaaaaaca 
atccagggaa 
aagctcgcag 
tgggacgcca 
gaagccggtc 
aacgccgagt 
tctcgccctt 
actacgacaa 
agccgttctt 
agcccaacgg 
ccccgccgcc 
caggagagcc 
tcgccggacc 
ccgaaccggc 
ccccaaccga 
cgccgcagca 
ggcgcaccgc 
cgtccagacc 
gacgtgcgcg 
caactggtcc 
cccccggaac 
ccacccgccc 
ggcgtgccga 
attcaattac 



acgcgacaca 
cccagaaacc 
gggtgcatgc 
acctgcacgc 
tcaaaggtgg 
tgcgggccga 
gggtagggcg 
actacaacga 
caccggaata 
atcctgcgtc 
cgctgacccg 
tcgacggcgc 
atttggcgag 
cagttaaaga 
tgccgtggga 
tctacaagcg 
gacgtcgttg 
gcggcctgcc 
gccagcggcg 
gttggaagac 
ggcgttcgct 
ggtggtcgac 
gttggtcgag 
cacggcattg 
gatggcgatg 
catcctgggg 
ccacctggcc 
ggccgtgccg 
tacggccgtg 
ggcgtcgttt 
tggataccag 
tgcggccaag 
cgaaaccgtg 
agaaaccccg 
cgagcgcagc 
tctggctgcc 
ggtcgcgggt 



9840 
9900 
9960 
10020 
10080 
10140 
10200 
10260 
10320 
10380 
10440 . 
10500 
10560 
10620 
10680 
10740 
10800 
10860 
10920 
10980 
11040 
11100 
11160 
11220 
11280 
11340 
11400 
11460 
11520 
11580 
11640 
11700 
11760 
11820 
11880 
11940 
12000 
12060 
12120 
12180 
12240 
12300 
12360 
12420. 
12480 
12540 
12600 
12660 
12720 
12780 
12840 
12900 
12960 
13020 
13080 
13140 
13200 
13260 
13320 
13380 
13440 



WO 03/085098 



14/66 



PCT/IB03/01789 



ttgatcacga ccgtctgcgg atttcgctcg cggctttacg ccgagcgctg gtgtgcgtgg 13500 
gcgttgctgg cggcgacggt cgcgattccg acgggtctga cggccaaact catcatctgg 13560 
tacccgcact atgcctggct gttgttgagc gtctacctca cggtagccct ggttgcgctc 13620 



gtggtggtcg ggtcgatggc tcacgtccgg cgcgtttcac cggtcgtaaa acgaactctg 
gaattgatcg acggcgccat gatcgctgcc atcattccca tgctgctgtg gatcaccggg 13740 
gtgtacgaca cggtccgcaa tatccggttc tga 



<210> 3 
<211> 3909 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> RD1-AP34 (a 3909 bp fragment of the M. 
tuberculosis H37Rv genome) 

<400> 3 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 



<400> 3 ~*-*-4-«4-«^*-„ 
gaattcccat ccagtgagtt caaggtcaag cggcgccccc ctggccaggc atttctcgtc 
tcgccagacg gcaaagaggt catccaggcc ccctacatcg agcctccaga agaagtgttc 
gcagcacccc caagcgccgg ttaagattat ttcattgccg gtgtagcagg acccgagctc 
agcccggtaa tcgagttcgg gcaatgctga ccatcgggtt tgtttccggc tataaccgaa 
cggtttgtgt acgggataca aatacaggga gggaagaagt aggcaaatgg aaaaaatgtc 
acatgatccg atcgctgccg acattggcac gcaagtgagc gacaacgctc tgcacggcgt 
gacggccggc tcgacggcgc tgacgtcggt gaccgggctg gttcccgcgg gggccgatga 
Ltctccgcc caagcggcga cggcgttcac atcggagggc atccaattgc tggcttccaa 
tgcatcggcc caagaccagc tccaccgtgc gggcgaagcg gtccaggacg tcgcccgcac 
ctattcgcaa atcgacgacg gcgccgccgg cgtcttcgcc gaataggccc ccaacacatc 
ggagggagtg atcaccatgc tgtggcacgc aatgccaccg gagctaaata ccgcacggct 
gatggccggc gcgggtccgg ctccaatgct tgcggcggcc gcgggatggc agacgctttc 
JgcOTCtSg gaclctcagg ccgtcgagtt gaccgcgcgc ctgaactctc tgggagaagc 
ctggactgga ggtggcagcg acaaggcgct tgcggctgca acgccgatgg tggtctggct 
acaLcclcg ?Lacacagg ccaagacccg tgcgatgcag gcgacggcgc aagccgcggc 900 
Sacacccag gccatggcca cgacgccgtc gctgccggag atcgccgcca accacatca 960 
ccaggccgtc cttacggcca ccaacttctt cggtatcaac acgatcccga tcgcgttgac 1020 
cgaStgiat tatttcatcc gtatgtggaa ccaggcagcc ctggcaatgg ^gtctacca 1080 
ggccgagacc gcggttaaca cgcttttcga gaagctcgag ccgatggcgt cgatccttga 1140 
?ccclgcgcg Igclagagca cgacgaaccc gatcttcgga atgccctccc ctggcagc*c 
aacaccggtt ggccagttgc cgccggcggc tacccagacc ctcggccaac tgggtgagat 
gagcggcccg atgcagcagc tgacccagcc gctgcagcag gtgacgtcgt ^gttcagcca 
Ig?gglcggc accggcggcg gcaacccagc cgacgaggaa gccgcgcaga tgggcctgct 
clgcaccagt ccgctgtcga accatccgct ggctggtgga tcaggcccca gcgcgggcgc 
gScctgc?g cgcgcggagt cgctacctgg cgcaggtggg tcgttgaccc 9cacgccgct 
Ia?gtcLaI c?gatcgaaa agccggttgc cccctcggtg atgccggcgg ctgctgccgg 
atcgtcggcg acgggtggcg ccgctccggt gggtgcggga gcgatgggcc agggtgcgca 
atccggSgJ tccaSc^gc cgggtctggt cgcgccggca ccgctcgcgc aggagcgtga 
agaagacgac gaggacgact gggacgaaga ggacgactgg tgagctcccg taatgacaac 
Sac?tcccg IccLccggg ccggaagact tgccaacatt ttggcgagga aggtaaagag 
alaaagtag? ccagcatggc agagatgaag accgatgccg ctaccctcgc 9caggaggca 
gltaaJttcg agcggatctc cggcgacctg aaaacccaga tcgaccaggt 99*gtcgacg 
gcaggttcg? tlcSggcca gtggcgcggc gcggcgggga cggccgccca Bgccgcggtg 
gtgclcttcc aagaagcagc caataagcag aagcaggaac tcgacgagat ^tcgacgaat 
attcgtcagg ccggcgtcca atactcgagg gccgacgagg agcagcagca Q^g^gtcc 
tcacLatgg gcttctgacc cgctaatacg aaaagaaacg gagcaaaaac atgacagagc 
agcag?ggla ?ttcgclggt a?cgaggccg cggcaagcgc aatccaggga aatgtcacgt 
ccat?c??tc cctcct?Sc gaggggaagc agtccctgac caagctcgca g^ggcctggg 
qcggtagcgg ttcggaggcg taccagggtg tccagcaaaa atgggacgcc acggctaccg 2340 
cgcStSaS aacctggcgc ggacgatcag cgaagccggt caggcaatgg 2400 
c^tcgaccga aggcaacgtc actgggatgt tcgcataggg caacgccgag "cgcgtaga 2460 
atagcgaaac aclggatcgg gcgagttcga ccttccgtcg Stctcgccct "ctcgtgtt 
tatacgtttg agcgcactct gagaggttgt catggcggcc gactacgaca agctcttccg 



1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
1 2220 
2280 
2340 



2520 
2580 
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gccgcacgaa 

tgcttcgttt 

gcccccgacg 

acccccacct 

ggaaccggcc 

cccacccaaa 

accacccaca 

ggcgcccccc 

accggcgccc 

gccctgggca 

cccggccgaa 

ccgctatcgc 

ggcgcggctg 

gccagcgccg 

agaacctccc 

ccaccccgat 

cactggcggt 

aaggccggcg 

gaagccgccc 

aatcaacctg 

ccgcaatccc 

aaccacgctg 

ggctctaga 



ggtatggaag 

ccgccggcgc 

tccgacgacc 

ccgcctccgc 

gcatctaaac 

ccacccacac 

cctccgatgc 

agaccaccga 

cacgtaccct 

aagatgccaa 

ccaccgaccc 

acagacaccg 

cgggcagagg 

ttgggccaac 

cccagcccct 

ttagccgccc 

cgtcgccgca 

gccaaggggc 

aaagtggtgt 

ggcctgtcac 

cgcgggtcgt 

acagcagcgt 



ctccggacga 

ccgcatcggc 

tgtcggagcg 

caactccgat 

cacccacacc 

cccccatgcc 

ccatcgccgg 

caccacaaac 

cgcacgggcc 

tcggcgaacc 

ggcctgcccc 

aacgaaacgt 

aagcatccgg 

cgagatcgta 

cgccgcagcg 

aacatgccgc 

agcgtgcagc 

cgaaggtgaa 

cgcagcgcgg 

ccgacgagaa 

atcagatcgc 

tggggtcgac 



15/66 

tatggcagcg 

aaacctaccg 

gttcgtgtcg 

gccgatcgcc 

ccccatgccc 

catcgccgga 

acctgcaccc 

gccaaccgga 

acatcaaccc 

cccgcccgct 

ccaacactcc 

cgggaaggta 

cgcgcagctc 

tctggctccg 

caactccggt 

ggcgcaacct 

gccggatctc 

gaaggtgaag 

ctggcgacat 

gtacgagctg 

cgtcgtcggt 

gttggctcag 
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cagccgttct 

aagcccaacg 

gccccgccgc 

gcaggagagc 

atcgccggac 

cccgaaccgg 

accccaaccg 

gcgccgcagc 

cggcgcaccg 

ccgtccagac 

cgacgtgcgc 

gcaactggtc 

gcccccggaa 

cccacccgcc 

cggcgtgccg 

gattcaatta 

gacgcgacac 

ccccagaaac 

tgggtgcatg 

gacctgcacg 

ctcaaaggtg 

gtgcgggccg 



tcgaccccag 

gccagactcc 

cgccaccccc 

cgccctcgcc 

ccgaaccggc 

ccccacccaa 

aatcccagtt 

aaccggaatc 

caccagcacc 

cgtctgcgtc 

gccggggtca 

catccatcca 

cggagccctc 

ccgcgccgac 

agcgacgcgt 

cggccgcaac 

agaaatcctt 

cgaaggccac 

cgttgacgcg 

ctcgagtccg 

gggctggcaa 

accggatcct 



2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3909 



<210> 4 
<211> 324 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> DNA sequence Rv3861 
<400> 4 

gtgacctggc tggctgaccc ggtcggcaac 
acgtcaatct cggcgcccat cgtcgaatcc 
cagcgcgaga aatcttgtcg atgttctcgc 
ctgttccgca gacccctcga accagcggtc 
agacacccgg tggtcgcgca ccgggtaacc 
caacgcgaat gcccgcgccc ggcc 



agcaggatcg cccgagcgca ggcctgcaaa 60 
tggcgggcgc aacgcggcgc gcaatgtgga 120 
gctgtccaca tccagggcat ctcaccgcca 180 
caggcggcgg ttgcgtcatg ccgattgggc 240 
gttgcgctcg gccagggatc gcagctggcc 300 



<210> 5 
<211> 348 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> DNA sequence Rv3862c-whiB6 
<400> 5 

atgcgatacg ctttcgcggc agaggctaca 
atgacagtaa ccgccctgta tgaggtcccg 
tggacgacga ctcccgacga cgaggccaag 
ctgtgtgcac gcgacgccgt cgagtccgcg 
attcccgaat caggccgggc gcgggcattc 
cgcaacggtt acccggtgcg cgaccaccgg 



acctgcaacg ctttctggag gaacgtagac 60 
ctcggcgttt gcacgcaaga tcccgatcgt 120 
accctgtgcc gggcttgccc gcgccggtgg 180 
ggtgcggaag ggctgtgggc aggggtcgta 240 
gcgttgggcc agctgcgatc cctggccgag 300 
gtgtctgccc aatcggca 348 



<210> 6 
<211> 1176 
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<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> DNA sequence RV3 863 
<400> 6 

atggcgggcg agcggaaagt ctgcccaccg 
acgcagatgt ctaaagcagg gtcgactgtc 
ggcacatcag acgtgattga gccccgtcgc 
gtcggcaccc agatcgacga ttctcgaatc 
gatggacggt ggcggatcgt cggcaacatc 
ggcagctcgg tgaccgtcag cgataagacc 
aaggcgttga cgttcgaagt cgtcaggccg 
caaccatcag cggacctgtc ggacgacccg 
cccggcgtgg ttcgcgcagg ggcggccgcg 
caacgcagct tggcggccga cgggatcatc 
ggccgtagtt ggccccggga acggacccgg 
gctggaacca tcgcgcgaat ccgtcggggc 
gcgtcccccg gactccggcc tgccgacggc 
gccgccgtag acggctgcag tctggctatc 
ttcaccgaac gtgccgcgcc gatccttgct 
caagcaaccc gcatcagccg gattaccccg 
cgccaccacg acgaattaat gaggctggga 
cgcttatatg ccgcacggcg gcgcgcgaac 
ggcgtcgcag aagaaatgat cgtcggcgcc 
accgaagcga tcgaagcact gatccgtcag 



tcccggctag tacccgcgaa taagggatca 60 
ggaccggcgc cgctggtcgc gtgcagcggc 12 0 
ggtgtcgcga tcattggcca ctcgtgccga 180 
tctcagacac atctgcgagc ggtatccgat 240 
ccgagaggta tgttcgtcgg cggacgacgc 300 
ctaatccgat tcggcgatcc ccctggaggc 360 
tcggattccg ctgcacagca cggccgcgta 420 
gcgcacaacg ctgcgccggt cgcaccggac 480 
gctgcgcgcc gtcgtgaact tgacatcagc 540 
aacgcgggcg cgctcatcgc gttcgagaaa 600 
gcaaaactcg aagaagtgct gcagtggccc 660 
gagcccaccg agcccgcaac aaaccccgac 720 
ccggcgtcct tgatcgcgca ggctgtcacc 780 
gcagcgttgc cggcgaccga ggaccccgag 840 
gatttgcgcc agctcgaggc gattgccgtc 900 
gaattgatca aggcgttggg cgcggtacgt 960 
gcaaccgccc ctggtgccac actggcgcag 1020 
ctttccaccc tggagactgc ccaagcggcc 1080 
gaagccgagg aagagttgcc agccgaggcc 1140 
atcaat H76 



<210> 7 
<211> 1206 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> DNA sequence Rv3 864 
<400> 7 

atggcatcgg gtagcggtct ttgcaagacg 
ttgcttggag agggaatccc cgacccaggc 
aaacaaatca gcgacaaaat gggactcgcc 
gcggaagctt acctaaacca gaacatcgcg 
ctcgacaaat taaccggcaa catgatctcg 
gacgtcctgc gggccatgaa gaagatgatt 
gaaaagattc cgctgctcgg ccacttgtgg 
atcgcgatgg ccgttgtcgg cggcgcattg 
gcgaccaacc tgaggggaat tctcggcagg 
ttccccggcc tgcccgggtt gcccagcctg 
aagttgcccg acattccgat ccccggcctg 
tggccgccca cccccggcag cccgttgttc 
gggttcccgg agttccccgc catccccggg 
cccaacttgt tccccggctt gccgggtctg 
ggcaagttac ccacctggac tgagctggcc 
ggcctgccca gcttgggttt tggcaatctg 
caggtgaccg ccaccatggg tcagctgcaa 
caactggcca gcatgggcag ccaacaagcg 
ggccagcagc acgccaccct cgtgagcgac 
gcggagcgtg cacccatcga cgctggcacc 
gtcctt 



acgagtaact ttatttgggg ccagttactc 60 
gacattttca acaccggttc gtcgctgttc 120 
attccgggca ccaactggat cggccaagcg 180 
caacaacttc gcgcacaggt gatgggcgat 240 
aatcaggcca aatacgtctc cgatacgcgc 300 
gacggtgtct acaaggtttg taagggcctc 360 
tcgtgggagc tcgcaatccc tatgtccggc 420 
ctctatctaa cgattatgac gctgatgaat 480 
ctgatcgaga tgttgacgac cttgccaaag 540 
cccgacatca tcgacggcct ctggccgccg 600 
cccgacatcc cgggcctacc cgacttcaaa 660 
cccgacctcc cgtcgttccc agggttcccc 720 
ttccccgcac tgcccgggtt gcccagcatt 780 
ggcgacctgc tgcccggcgt aggcgatttg 840 
gctttgcctg acttcttggg cggcttcgcc 900 
ctcagctttg ccagtttgcc caccgtgggt 960 
cagctcgtgg cggccggcgg tggccccagc 1020 
caactgatct cgtcgcaggc ccagcaagga 1080. 
aagaaggaag acgaggaagg cgtggccgag 1140 
gcggccagcc aacgggggca ggaggggacc 1200 

1206 
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<210> 8 
<211> 309 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> DNA sequence Rv3 865 



<400> 8 

atgaccggat ttctcggtgt cgtgccttcg 
gagatcgtgg gtgatatcaa aagggcgacc 
cagcttaccc atggttcgtt cacgtcgaaa 
acccgtagca gcacgggcac gggtttgcag 
ctcgcagccg ccggcgccta cctcaaggcc 
attttcggt 



ttcctgaagg tgctggcggg catgcacaac 60 
gatacggtcg ccgggattag cggacgagtt 120 
ttcaatgaca cgctgcaaga gtttgagacc 180 
ggagtcacca gcggactggc caataatctg 240 
gacgatggcc tagccggtgt tatcgacaag 300 

309 



<210> 9 
<211> 849 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> DNA sequence Rv3866 



<400> 9 

atgacgggtc cgtccgctgc 
gtaaccatcg acggcatgtt 
acgcttggga ttcggccgaa 
gtgcagcgtg acctcacagc 
gtcgcggaga tggtcgaaac 
cggcgcgaca ttggcggcgt 
gtgatcgcgg cgcgcgacgg 
ttggcgggca tggtgacagc 
acgggtgtgg caaccgagct 
atcgcaccgg cctcggcccg 
gagatcgttg ccagccaacg 
ggcgtcctgg actccaagct 
ctgtacggaa gcttcctgcc 
ctagagctgc tccctgcggg 
tcccgaggc 



aggccgcgcg ggcaccgccg 
ggtgatcgcc gatcggttac 
tatcccgcaa gaggatctgc 
gcaaggggtg ctcgacctcc 
cctgggcagg ccagatcgga 
catggtgcgc ttcgtcgtgt 
cgacatgctg gtgctgcagt 
ggtgctgggg cccgccgaac 
agccgaatgc acaaccgcgt 
cgtctatgcc gagatcgtgg 
ccaccccggc ggcaccacga 
cggtaggctg gtgtcgcttc 
cggcactcag cagaacttgg 
cgcttggcta gatcacacct 



acaacgtggt cggcgtcgag 60 
acctggttga tttccctgtc 120 
gagacatcgt ctgggaacag 180 
acggggagcc ccaaccgacg 240 
ccttggaggg tcgctggtgg 300 
gccgcagggg cgaccgccat 360 
tggtggcgcc gcaggtcggc 420 
ccgccaacgt cgaacccctg 480 
cccaattgac gcaatacggt 540 
gtaacccgac cggctgggtg 600 
cgcagaccga cgccgccgct 660 
cccgccgtgt tggaggcgac 720 
agcgtgcgct ggacggcttg 780 
cagatcacgc acaagcctcc 840 

849 



<210> 10 
<211> 552 
<212> DNA 

<213> mycobacterium tuberculosis 
<220> 

<223> DNA sequence Rv3 867 



atggtggacc cgccgggcaa cgacgacgac cacggtgatc tcgacgccct cgatttctcc 60 
gccgcccaca ccaacgaggc gtcgccgctg gacgccttag acgactatgc gccggtgcag 12 0 
accgatgacg ccgaaggcga cctggacgcc ctccatgcgc tcaccgaacg cgacgaggag 
ccggagctgg agttgttcac ggtgaccaac cctcaagggt cggtgtcggt ctcaaccctg 
atggacggca gaatccagca cgtcgagctg acggacaagg cgaccagcat gtccgaagcg 
cagctggccg acgagatctt cgttattgcc gatctggccc gccaaaaggc gcgggcgtcg 
cagtacacgt tcatggtgga gaacatcggt gaactgaccg acgaagacgc agaaggcagc 
gccctgctgc gggaattcgt ggggatgacc ctgaatctgc cgacgccgga agaggctgcc 
gcagccgaag ccgaagtgtt cgccacccgc tacgatgtcg actacacctc ccggtacaag 



180 
240 
300 
360 
420 
480 
540 
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gccgatgact ga 



552 



<210> 11 
<211> 1722 
<212> DNA 

<213> mycobacterium tuberculosis 



<220> 

<223> DNA sequence Rv3868 



atgactgatc gcttggccag tctgttcgaa agcgccgtca gcatgttgcc gatgtcggag 
gcgcggtcgc tagatctgtt caccgagatc accaactacg acgaatccgc ttgcgacgca 
tggatcggcc ggatccggtg tggggacacc gaccgggtga cgctgtttcg cgcctggtat 
tcgcgccgca atttcggaca gttgtcggga tcggtccaga tctcgatgag cacgttaaac 
gccaggattg ccatcggggg gctgtacggc gatatcacct acccggtcac ctcgccgcta 
gcgatcacca tgggctttgc cgcatgcgag gcagcgcaag gcaattacgc cgacgccatg 
gaggccttag aggccgcccc ggtcgcgggt tccgagcacc tggtggcgtg gatgaaggcg 
gttgtctacg gcgcggccga acgctggacc gacgtgatcg accaggtcaa f a ^gctggg 
aaatggccgg acaagttttt ggccggcgcg gccggtgtgg cgcacggggt tgccgcggca 
aacctggcct tgttcaccga agccgaacgc cgactcaccg aggccaacga ctcgcccgcc 
agtgaggcgt gtgcgcgcgc catcg.cctgg tatctggcga tggcacggcg cagccagggc 
laclaaagcg cclcggtggc gctgctggaa tggttacaga ccactcaccc cgagcccaaa 
gtggctgcgg cgctgaagga tccctcctac cggctgaaga cgaccaccgc cgaacagatc 
Icatcccgcg ccgatccctg ggatccgggc agtgtcgtga ccgacaactc cggccgggag 
cggctgctcg ccgaggccca agccgaactc gaccgccaaa ttgggctcac ccgggttaaa 
aatcagattg aacgctaccg cgcggcgacg ctgatggccc gggtccgcgc cgccaagggt 
atgaaggtcg cccagcccag caagcacatg atcttcaccg gaccgcccgg ^accggcaag 
accacStcg cgcgggtggt ggccaatatc ctggccggct taggcgtcat tgccgaaccc 
aa"2£3 agS?clcg claggacttc gtcgccgagt acgaggggca atcggcggtc 
aagacclctl agacgatcga tcaggcgctg ggcggggtgc ttttcatcga ^aggcttat 
gcgctggtgc aggaaagaga cggccgcacc gatccgttcg gtcaagaggc gctggacacg 
ctgctlgcgc ggatggagaa cgaccgggac cggctggtgg tgatcatcgc cgggtacagc 
tc^gaStag Scg^tjct ggaaaccaac gagggtctgc ggtcgcggtt cgccactcgc 
atcgagttcg acacctattc ccccgaggaa ctcctcgaga tcgccaacgt cattgccgct 
gctgatgatt cggcgttgac cgcagaggcg gccgagaact ttcttcaggc cgccaagcag 
?tggagcagc gcatgttgcg cggccggcgc gccctggacg tcgccggcaa cggtcggtat 
gclcgccalc ?ggt|gaigc cagcgagcaa tgccgggaca tgcgtctagc ccaggtcctc 
latatcgaca ccctcgacga agaccggctt cgcgagatca acggctcaga tatggcggag 
gctatcgccg cggtgcacgc acacctcaac atgagagaat ga 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1722 



<210> 12 
<211> 1443 
<212> DNA 

<213> mycobacterium tuberculosis 
<220> 

<223> DNA sequence Rv3 869 



atggggcttc gcctcaccac caaggttcag gttagcggct ggcgttttct 9ctgcgccgg 60 

ctcglacacg ccatcgtgcg ccgggacacc cggatgtttg acgacccgct gcagttctac 120 

aoccgctcgl tcgctctlgg catcgtcgtc gcggtcctga ttctggcggg tgccgcgctg 180 

c?ggcgtact tcLaccaca aggcaaactc ggcggcacca gcctgttcac ^ccgcgcg 240 

accaaccagc tttacgtgct gctgtccgga cagttgcatc cggtctacaa cctgacttcg 300 

gcgcggcSg SctgggLa Iccigccaac ccggccaccg tgaagtcctc cgaactgagc 360 

aaacKccga tgggccagac cgttggaatc cccggcgccc cctacgccac gcctgtttcg 420 

gclggcag^a ccSgatctg gacc^atgc gacaccgtcg cccgagccga ctccacttcc 480 

?cSSg?gc agaccgcggt catcgcgatg ccgttggaga tcgatgcttc gatcgatccg 540 
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ctccagtcac acgaagcggt gctggtgtcc taccagggcg aaacctggat cgtcacaact 
aagggacgcc acgccataga tctgaccgac cgcgccctca cctcgtcgat ggggataccg 
gtgacggcca ggccaacccc gatctcggag ggcatgttca acgcgctgcc tgatatgggg 
ccctggcagc tgccgccgat accggcggcg ggcgcgccca attcgcttgg cctacctgat 
gatctagtga tcggatcggt cttccagatc cacaccgaca agggcccgca atactatgtg 
gtgctgcccg acggcatcgc gcaggtcaac gcgacaaccg ctgcggcgct gcgcgccacc 
caggcgcacg ggctggtcgc gccaccggca atggtgccca gtctggtcgt cagaatcgcc 
gaacgggtat acccctcacc gctacccgat gaaccgctca agatcgtgtc ccggccgcag 
gatcccgcgc tgtgctggtc atggcaacgc agcgccggcg accagtcgcc gcagtcaacg 
gtgctgtccg gccggcatct gccgatatcg ccctcagcga tgaacatggg gatcaagcag 
atccacggga cggcgaccgt ttacctcgac ggcggaaaat tcgtggcact gcaatccccc 
gatcctcgat acaccgaatc gatgtactac atcgatccac agggcgtgcg ttatggggtg 
cctaacgcgg agacagccaa gtcgctgggc ctgagttcac cccaaaacgc gccctgggag 
atcgttcgtc tcctggtcga cggtccggtg ctgtcgaaag atgccgcact gctcgagcac 
gacacgctgc ccgctgaccc tagcccccga aaagttcccg ccggagcctc cggagccccc 



<210> 13 
<211> 2244 
<212> DNA 

<213> mycobacterium tuberculosis 
<220> 

<223> DNA sequence Rv3870 
<400> 13 



atgacgacca agaagttcac tcccaccatt acccgtggcc cccggttgac cccgggcgag 
atcagcctca cgccgcccga tgacctgggc atcgacatcc caccgtcggg cgtccaaaag 
atccttccct acgtgatggg tggcgccatg ctcggcatga tcgccatcat ggtggccggc 
ggcaccaggc agctgtcgcc gtacatgttg atgatgccgc tgatgatgat cgtgatgatg 
gtcggcggtc tggccggtag caccggtggt ggcggcaaga aggtgcccga aatcaacgcc 
gaccgcaagg agtacctgcg gtatttggca ggactacgca cccgagtgac gtcctcggcc 
acctctcagg tggcgttctt ctcctaccac gcaccgcatc ccgaggatct gttgtcgatc 
gtcggcaccc aacggcagtg gtcccggccg gccaacgccg acttctatgc ggccacccga 
atcggtatcg gtgaccagcc ggcggtggat cgattattga agccggccgt cggcggggag 
ttggccgccg ccagcgcagc acctcagccg ttcctggagc cggtcagtca tatgtgggtg 
gtcaagtttc tacgaaccca tggattgatc catgactgcc cgaaactgct gcaactccgt 
acctttccga ctatcgcgat cggcggggac ttggcggggg cagccggcct gatgacggcg 
atgatctgtc acctagccgt gttccaccca ccggacctgc tgcagatccg ggtgctcacc 
gaggaacccg acgaccccga ctggtcctgg ctcaaatggc ttccgcacgt acagcaccag 
accgaaaccg atgcggccgg gtccacccgg ctgatcttca cgcgccagga aggtctgtcg 
gacctggccg cgcgcgggcc acacgcaccc gattcgcttc ccggcggccc ctacgtagtc 
gtcgtcgacc tgaccggcgg caaggctgga ttcccgcccg acggtagggc cggtgtcacg 
gtgatcacgt tgggcaacca tcgcggctcg gcctaccgca tcagggtgca cgaggatggg 
acggctgatg accggctccc taaccaatcg tttcgccagg tgacatcggt caccgatcgg 
atgtcgccgc agcaagccag ccgtatcgcg cgaaagttgg ccggatggtc catcacgggc 
accatcctcg acaagacgtc gcgggtccag aagaaggtgg ccaccgactg gcaccagctg 
gtcggtgcgc aaagtgtcga ggagataaca ccttcccgct ggaggatgta caccgacacc 
gaccgtgacc ggctaaagat cccgtttggt catgaactaa agaccggcaa cgtcatgtac 
ctggacatca aagagggcgc ggaattcggc gccggaccgc acggcatgct catcgggacc 
acggggtctg ggaagtccga attcctgcgc accctgatcc tgtcgctggt ggcaatgact 
catccagatc aggtgaatct cctgctcacc gacttcaaag gtggttcaac cttcctggga 
atggaaaagc ttccgcacac tgccgctgtc gtcaccaaca tggccgagga agccgagctc 
gtcagccgga tgggcgaggt gttgaccgga gaactcgatc ggcgccagtc gatcctccga 
caggccggga tgaaagtcgg cgcggccgga gccctgtccg gcgtggccga atacgagaag 
taccgcgaac gcggtgccga cctacccccg ctgccaacgc ttttcgtcgt cgtcgacgag 
ttcgccgagc tgttgcagag tcacccggac ttcatcgggc tgttcgaccg gatctgccgc 



cgctgagggt ccatctgctg ctggctaccc agtcgctgca gaccggcggt 



gttcgcatcg acaaactgga gccaaacctg acatatcgaa tcgcattgcg caccaccagc 
tctcatgaat ccaaggcggt aatcggcaca ccggaggcgc agtacatcac caacaaggag 



600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1443 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 



WO 03/085098 



20/66 



PCT/TO03/01789 



agcggtgtcg ggtttctccg ggtcggcatg gaagacccgg tcaagttcag caccttctac 
atcagtgggc catacatgcc gccggcggca ggcgtcgaaa ccaatggtga agccggaggg 
cccggtcaac agaccactag acaagccgcg cgcattcaca ggttcaccgc ggcaccggtt 
ctcgaggagg cgccgacacc gtga 



2100 
2160 
2220 
2244 



<210> 14 
<211> 1776 
<212> DNA 

<213> raycobacterium tuberculosis 
<220> 

<223> DNA sequence Rv3871 



<400> 14 
atgactgctg 
gctgaatcgc 
gagctcatcg 
ccgcgccgcc 
ggtattgggg 
gccgccgcca 
gggctgatct 
gacaaggtca 
ttcaaggaac 
caacccgttg 
tttgtcggcg 
gcgttcggcg 
cgcgactacc 
gaccggatta 
caccatctga 
gaggcgatca 
gtgcgggtcc 
gagtccgact 
ccggctcact 
ggcaagacga 
caggtgcggt 
catctgctgg 
gcactggcgg 
cgctcgcgtt 
atcgtgggtg 
gcggcagata 
accatggaca 
ggcgagaagc 
gcatttctcg 
gaagaagtgt 



aaccggaagt 
gtgcgtacaa 
cccgtgatcg 
atctacagga 
gcgcacctca 
cacactcacc 
atctcgaaaa 
accgggtggt 
accgagtggg 
cgtccgatcc 
agttccccga 
tccacgtcat 
tcggcaccaa 
cccgcgagat 
tgatcggcgt 
ccgcgggggt 
tgccggagcg 
accgcactcg 
gccacatgca 
ccattgccca 
tcatgctcgc 
gcgccggcgc 
tcaacctgaa 
cgtggtggag 
ccgccggggg 
tcgggttgca 
agttcgtcgg 
aggaattccc 
tctcgccaga 
tcgcagcacc 



acggacgctg 
gatgtggctg 
gcgacaaccc 
tgtgtggggc 
aaccgggaag 
gcgcaacgtt 
ccttccacac 
cgcagagatg 
ctcgatcggg 
atacggcgac 
ccttgagggg 
catctccacg 
gatcgagttc 
cccggcgaat 
gcccaggttc 
gacgc agate 
tatccacctg 
ctgggagatt 
cacgaacccg 
cgcgatcgcg 
ggactaccgc 
gatcaaccgc 
gaagcggttg 
eggatttgae 
gatgccgccg 
catcattgtc 
cgccgcattc 
atccagtgag 
eggcaaagag 
cccaagcgcc 



cgcgaggttg 
ccgccgttga 
ctgcgatttg 
gtagacgttt 
tcgacgctac 
cagttctatt 
gtcggtgggg 
caagccgtca 
atgtaccggc 
gtctttctga 
caggttcaag 
ccacgctgga 
cggcttggtg 
cgtccgggtc 
gacggcgtgc 
gcttcccagc 
cacgaactcg 
ccgatcggct 
cacctactga 
cgcgccattt 
tcgggcctgc 
aacagegegt 
ccgccgaccg 
gtcgtgcttc 
atggcaccgc 
acctgtcaga 
gggtcgggcg 
ttcaaggtca 
gtcatccagg 
ggttaa 



tgctggacca 
ccaatccggt 
ccctggggat 
ccggggccgg 
tgcagacgat 
gcatcgacct 
tagecaateg 
tgcggcaacg 
agctgcgtga 
teatcgaegg 
atctggccgc 
cagagctgaa 
aegtcaatga 
gggcagtgtc 
acagcgccga 
acaccgaaca 
acccgaaccc 
tgegegagae 
tcttcggtgc 
gtgcccgaaa 
tggacgcggt 
egctagaega 
acctgacgac 
tggtcgacga 
tggccccgtt 
tgagecagge 
ctccgacaat 
agcggcgccc 
ccccctacat 



gctcggcact 
cccgctcaac 
catggatgaa 
cggcaacatc 
ggtgatgtcg 
aggtggcggc 
gtccgagccc 
ggaaaccacc 
cgatccaagt 
atggcccggt 
ccaggggctg 
gtcgcgtgtt 
aacccagatc 
gatggaaaag 
taacctggtg 
ggcacctccg 
gccgggacca 
ggacctgacg 
ggecaaateg 
cagtccccag 
gccggacacc 
ggccgttcaa 
ggegcagcta 
ttggcacatg 
attgeeggeg 
ttacaaggca 
gttcctttcg 
ccctggccag 
cgagcctcca 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 

960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1776 



<210> 15 
<211> 297 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> PE coding sequence (Rv3 872) 



<400> 15 

atggaaaaaa 

gctctgcacg 

gcgggggccg 

ttgctggctt 

gacgtcgccc 



tgtcacatga 
gcgtgacggc 
atgaggtctc 
ccaatgcatc 
gcacctattc 



tccgatcgct 
cggctcgacg 
cgcccaagcg 
ggcccaagac 
gcaaatcgac 



gecgacattg 
gcgctgacgt 
gcgacggcgt 
cagctccacc 
gacggcgccg 



geaegcaagt 
eggtgacegg 
teacategga 
gtgegggega 
ccggcgtctt 



gagegacaac 
gctggttccc 
gggcatccaa 
ageggtccag 
cgccgaa 



60 
120 
180 
240 
297 
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<210> 16 
<211> 1104 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> PPE coding sequence (Rv3873) 



60 



<400> 16 

atgctgtggc acgcaatgcc accggagcta aataccgcac ggctgatggc cggcgcgggt 

ccggctccaa tgcttgcggc ggccgcggga tggcagacgc tttcggcggc tctggacgct 120 

caggccgtcg agttgaccgc gcgcctgaac tctctgggag aagcctggac tggaggtggc 180 

agcgacaagg cgcttgcggc tgcaacgccg atggtggtct ggctacaaac cgcgtcaaca 240 

caggccaaga cccgtgcgat gcaggcgacg gcgcaagccg cggcatacac ccaggccatg 300 

gccacgacgc cgtcgctgcc ggagatcgcc gccaaccaca tcacccaggc cgtccttacg 3 60 

gccaccaact tcttcggtat caacacgatc ccgatcgcgt tgaccgagat ggattatttc 420 

atccgtatgt ggaaccaggc agccctggca atggaggtct accaggccga gaccgcggtt 480 

aacacgcttt tcgagaagct cgagccgatg gcgtcgatcc ttgatcccgg cgcgagccag 540 

agcacgacga acccgatctt cggaatgccc tcccctggca gctcaacacc ggttggccag 60 0 
ttgccgccgg cggctaccca gaccctcggc caactgggtg agatgagcgg cccgatgcag 
cagctgaccc agccgctgca gcaggtgacg tcgttgttca gccaggtggg cggcaccggc 
ggcggcaacc cagccgacga ggaagccgcg cagatgggcc tgctcggcac cagtccgctg 

tcgaaccatc cgctggctgg tggatcaggc cccagcgcgg gcgcgggcct gctgcgcgcg 840 

gagtcgctac ctggcgcagg tgggtcgttg acccgcacgc cgctgatgtc tcagctgatc ooo 
gaaaagccgg ttgccccctc ggtgatgccg gcggctgctg ccggatcgtc ggcgacgggt 
ggcgccgctc cggtgggtgc gggagcgatg ggccagggtg cgcaatccgg cggctccacc 

aggccgggtc tggtcgcgcc ggcaccgctc gcgcaggagc gtgaagaaga cgacgaggac 1080 

gactgggacg aagaggacga ctgg 1104 



660 
720 
780 



900 
960 
1020 



<210> 17 
<211> 300 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> CFP-10 coding sequence (Rv3874) 



<400> 17 

atggcagaga 

atctccggcg 

ggccagtggc 

gcagccaata 

gtccaatact 



tgaagaccga 
acctgaaaac 
gcggcgcggc 
agcagaagca 
cgagggccga 



tgccgctacc 
ccagatcgac 
ggggacggcc 
ggaactcgac 
cgaggagcag 



ctcgcgcagg 
caggtggagt 
gcccaggccg 
gagatctcga 
cagcaggcgc 



aggcaggtaa 
cgacggcagg 
cggtggtgcg 
cgaatattcg 
tgtcctcgca 



tttcgagcgg 
ttcgttgcag 
cttccaagaa 
tcaggccggc 
aatgggcttc 



60 
120 
180 
240 
300 



<210> 18 
<211> 285 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> ESAT-6 coding sequence (Rv3875) 



<400> 18 

atgacagagc agcagtggaa tttcgcgggt atcgaggccg cggcaagcgc aatccaggga 

aatgtcacgt ccattcattc cctccttgac gaggggaagc agtccctgac caagctcgca 

gcggcctggg gcggtagcgg ttcggaggcg taccagggtg tccagcaaaa atgggacgcc 

acggctaccg agctgaacaa cgcgctgcag aacctggcgc ggacgatcag cgaagccggt 



60 
120 
180 
240 



WO 03/085098 
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caggcaatgg cttcgaccga aggcaacgtc actgggatgt tcgca 



PCT/IB03/01789 



285 



<210> 19 
<211> 2001 
<212> DNA 

<213> mycobacterium tuberculosis 
<220> 

<223> DNA sequence Rv3876 



<400> 19 
atggcggccg 
atggcagcgc 
aacctaccga 
ttcgtgtcgg 
ccgatcgccg 
cccatgccca 
atcgccggac 
cctgcaccca 
ccaaccggag 
catcaacccc 
ccgcccgctc 
caacactccc 
gggaaggtag 
gcgcagctcg 
ctggctccgc 
aactccggtc 
gcgcaacctg 
ccggatctcg 
aaggtgaagc 
tggcgacatt 
tacgagctgg 
gtcgtcggtc 
ttggctcagg 
ctcgccgatc 
gagctgtcgc 
gtgctgccgg 
ttcatcgccg 
ttcttcgacc 
agtgtctcaa 
ggttaccaag 
cccaatgtcg 
gtcgtggtca 
ctcgacccta 
gagagggctg 



actacgacaa 
agccgttctt 
agcccaacgg 
ccccgccgcc 
caggagagcc 
tcgccggacc 
ccgaaccggc 
ccccaaccga 
cgccgcagca 
ggcgcaccgc 
cgtccagacc 
gacgtgcgcg 
caactggtcc 
cccccggaac 
ccacccgccc 
ggcgtgccga 
attcaattac 
acgcgacaca 
cccagaaacc 
gggtgcatgc 
acctgcacgc 
tcaaaggtgg 
tgcgggccga 
gggtagggcg 
actacaacga 
caccggaata 
atcctgcgtc 
cgctgacccg 
tcgacggcgc 
atttggcgag 
cagttaaaga 
tgccgtggga 
tctacaagcg 
gacgtcgttg 



gctcttccgg 
cgaccccagt 
ccagactccg 
gccaccccca 
gccctcgccg 
cgaaccggcc 
cccacccaaa 
atcccagttg 
accggaatca 
accagcaccg 
gtctgcgtcc 
ccggggtcac 
atccatccag 
ggagccctcg 
cgcgccgaca 
gcgacgcgtc 
ggccgcaacc 
gaaatcctta 
gaaggccacg 
gttgacgcga 
tcgagtccgc 
ggctggcaaa 
ccggatcctg 
acaatcgggc 
catccgcgca 
cagctcggcg 
gaggttttac 
cggcgtgctg 
acaacaggcg 
ccgcgcatgc 
cctggtgcgg 
caggcacatt 
caaggtcctc 



ccgcacgaag 
gcttcgtttc 
cccccgacgt 
cccccacctc 
gaaccggccg 
ccacccaaac 
ccacccacac 
gcgcccccca 
ccggcgcccc 
ccctgggcaa 
ccggccgaac 
cgctatcgca 
gcgcggctgc 
ccagcgccgt 
gaacctcccc 
caccccgatt 
actggcggtc 
aggccggcgg 
aagccgccca 
atcaacctgg 
cgcaatcccc 
accacgctga 
gctctagacg 
gcgaccatcg 
cacactagcg 



cagcgcgcgc 
aacctcgtct 
tccacggtgt 
tcggtcgcgt 
gtggtcatca 
catttcgaac 
gcggccggaa 
gaattggccg 



gtatggaagc 
cgccggcgcc 
ccgacgacct 
cgcctccgcc 
catctaaacc 
cacccacacc 
ctccgatgcc 
gaccaccgac 
acgtaccctc 
agatgccaat 
caccgacccg 
cagacaccga 
gggcagagga 
tgggccaacc 
ccagcccctc 
tagccgccca 
gtcgccgcaa 
ccaaggggcc 
aagtggtgtc 
gcctgtcacc 
gcgggtcgta 
cagcagcgtt 
cggatccagg 
ctgatgtgct 
tcaatgcggt 
tcagcgacgc 
tggctgattg 
ccggtgtcgt 
tggactggtt 
atcacatcat 
agcaagttca 
ccgagatttc 
cagcgctatc 



tccggacgat 
cgcatcggca 
gtcggagcgg 
aactccgatg 
acccacaccc 
ccccatgccc 
catcgccgga 
accacaaacg 
gcacgggcca 
cggcgaaccc 
gcctgccccc 
acgaaacgtc 
agcatccggc 
gagatcgtat 
gccgcagcgc 
acatgccgcg 
gcgtgcagcg 
gaaggtgaag 
gcagcgcggc 
cgacgagaag 
tcagatcgcc 
ggggtcgacg 
cgccggaaac 
tgcagaaaaa 
caatctggaa 
cgactggcat 
tggggccggc 
ggtcgtggca 
gcgcaacaac 
gccgggagaa 
acccggccgg 
actcgacttg 
cgacgatttc 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 

960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2001 



<210> 20 
<211> 1536 
<212> DNA 

<213> mycobacterium tuberculosis 
<220> 

<223> DNA sequence Rv3877 



ttgagcgcac ctgctgttgc tgctggtcct accgccgcgg gggcaaccgc tgcgcggcct 60 
gccaccaccc gggtgacgat cctgaccggc agacggatga ccgatttggt actgccagcg 120 
gcggtgccga tggaaactta tattgacgac accgtcgcgg tgctttccga ggtgttggaa 
gacacgccgg ctgatgtact cggcggcttc gactttaccg cgcaaggcgt gtgggcgttc 



180 
240 



WO 03/085098 



gctcgtcccg 

gacgggtcac 

gaggatgtca 

ttgaatcgct 

atgcgggcgt 

gggatcgctg 

gccgagtgcc 

ccgttgccgc 

gtgctgtttt 

tttgccgtga 

caggactggg 

aagctgaccg 

gtggacaacg 

ccgacctggc 

agcaaactgg 

gccggtgcca 

ggtttgatca 

tgggcgttgc 

tggtacccgc 

ctcgtggtgg 

ctggaattga 

ggggtgtacg 



gatcgccgcc 
tgctgactct 
tcgacgcgat 
ttgtgggggc 
ggtgggaaac 
tgctggtagg 
tactggtcac 

gcggggtcaa 

tgaccttgat 
tcaccgctat 
tccccgcggg 
tcgcggtcgc 
aggagttgct 
aggccatcat 
ccaagcaact 
tcgcggtcgt 
cgaccgtctg 
tggcggcgac 
actatgcctg 
tcgggtcgat 
tcgacggcgc 
acacggtccg 



gctgaagctc 

ggtgtcagtc 

cgccgtgctt 

ggcgatcccg 

tgggcgtagc 

cagcttcgtc 

gacgtatctg 

ctcgttgggg 

gacgcggggc 

cgcggtcatc 

ggggatcgca 

gcggatcgcg 

cgatcccgtc 

cgcgtcggtg 

tctgatcgga 

ggtgcgcggg 

cggatttcgc 

ggtcgcgatt 

gctgttgttg 

ggctcacgtc 

catgatcgct 

caatatccgg 



23/66 

gaccagtcac 

agtcgcaccg 

gacgagtcac 

cttttgaccg 

ttgtggtggc 

gcgaacaggt 

ctgatcgcaa 

gcgccacaag 

ggccctcgga 

gcggccgccg 

ttcgggctgt 

ctgccgccga 

gcgaccccgg 

cccgcgtccg 

tacgtcacgt 

cacttctttg 

tcgcggcttt 

ccgacgggtc 

agcgtctacc 

cggcgcgttt 

gccatcattc 

ttctga 



PCT/IB03/01789 



tcgatgacgc 

agcgctaccg 

ctgagttcga 

cgcccgtcat 

cgttggcgat 

tctaccagag 

ccgccgcagc 

ttgccggcgc 

agcgtcatga 

ctgccttcgg 

tcattgtgac 

ttccggtacc 

aggctaccag 

cggtccggct 

cgggcaccct 

tacacagcct 

acgccgagcg 

tgacggccaa 

tcacggtagc 

caccggtcgt 

ccatgctgct 



cggggtggtc 

accgttggtc 

ccgcacggca 

cgggatggcg 

tggcatcctg 

cggccacctg 

gctggccgtg 

cgctacggcc 

gttggcgtcg 

ctatggatac 

gaatgcggcc 

cggcgaaacc 

cgaagaaacc 

caccgagcgc 

gattctggct 

ggtggtcgcg 

ctggtgtgcg 

actcatcatc 

cctggttgcg 

aaaacgaact 

gtggatcacc 



300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1536 



<210> 21 
<211> 840 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> DNA sequence Rv3 878 
<400> 21 

atggctgaac cgttggccgt cgatcccacc 
ggcctcgttt ttccgcagcc tccggcgccg 
gcagcaatca acgagaccat gccaagcatc 
gtgaaagccg ccctgactcg aacagcatcc 
aagaccgatc agtcactggg aaccagtttg 
ggcctggctg gcgtcgcctc ggtcggtggt 
acacccgtgt cacaggtcac gacccagctc 
gttgttgcga cggtgccgca actcgttcag 
aacgcatccc ccatcgctca gacgatcagt 
cagggcggca gcggcccaat gcccgcacag 
caagcggagc cggtccacga agtgacaaac 
ccggccgagg tcgttgccgc ggcacgtgac 
cccggcgggg gcgttcccgc gcaagccatg 
agtccgctgg cggcccccgt cgatccgtcg 



ggcttgagcg cagcggccgc gaaattggcc 60 
atcgcggtca gcggaacgga ttcggtggta 120 
gaatcgctgg tcagtgacgg gctgcccggc 180 
aacatgaacg cggcggcgga cgtctatgcg 240 
agccagtatg cattcggctc gtcgggcgaa 300 
cagccaagtc aggctaccca gctgctgagc 360 
ggcgagacgg ccgctgagct ggcaccccgt 420 
ctggctccgc acgccgttca gatgtcgcaa 480 
caaaccgccc aacaggccgc ccagagcgcg 540 
cttgccagcg ctgaaaaacc ggccaccgag 600 
gacgatcagg gcgaccaggg cgacgtgcag 660 
gaaggcgccg gcgcatcacc gggccagcag 720 
gataccggag ccggtgcccg cccagcggcg 780 
actccggcac cctcaacaac cacaacgttg 840 



<210> 22 
<211> 2187 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> DNA sequence Rv3879c 



<400> 22 

atgagtatta ccaggccgac gggcagctat 
gtggaagccg atgaagacac tttctatgac 
agggtcaccg atgtattgga cacctgccgc 



gccagacaga tgctggatcc gggcggctgg 60 
cgggcccagg aatatagcca ggttttgcaa 120 
cagcagaaag gccacgtctt cgaaggcggc 180 
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ctatggtccg gcggcgccgc caatgctgcc aacggcgccc tgggtgcaaa catcaatcaa 240 
ttgatgacgc tgcaggatta tctcgccacg gtgattacct ggcacaggca tattgccggg 300 
ttgattgagc aagctaaatc cgatatcggc aataatgtgg atggcgctca acgggagatc 360 
gatatcctgg agaatgaccc tagcctggat gctgatgagc gccataccgc catcaattca 420 
ttggtcacgg cgacgcatgg ggccaatgtc agtctggtcg ccgagaccgc tgagcgggtg 480 
ctggaatcca agaattggaa acctccgaag aacgcactcg aggatttgct tcagcagaag 540 
tcgccgccac ccccagacgt gcctaccctg gtcgtgccat ccccgggcac accgggcaca 600 
ccgggaaccc cgatcacccc gggaaccccg atcaccccgg gaaccccaat cacacccatc 660 
ccgggagcgc cggtaactcc gatcacacca acgcccggca ctcccgtcac gccggtgacc 720 
ccgggcaagc cggtcacccc ggtgaccccg gtcaaaccgg gcacaccagg cgagccaacc 780 
ccgatcacgc cggtcacccc cccggtcgcc ccggccacac cggcaacccc ggccacgccc 840 
gttaccccag ctcccgctcc acacccgcag ccggctccgg caccggcgcc atcgcctggg 900 
ccccagccgg ttacaccggc cactcccggt ccgtctggtc cagcaacacc gggcacccca 960 
gggggcgagc cggcgccgca cgtcaaaccc gcggcgttgg cggagcaacc tggtgtgccg 1020 
SSgLtg ciigcggggg gacgcagtcg gggcctgccc atgcggacga atccgccgcg 1080 
tcggtgacgc cggctgcggc gtccggtgtc ccgggcgcac gggcggcggc cgccgcgccg 1140 
agcggtaccg ccgtgggagc gggcgcgcgt tcgagcgtgg gtacggccgc S9 CC **99° "00 
gcggggtcgc atgctgccac tgggcgggcg ccggtggcta cctcggacaa ggcggcggca 1260 
ccgagcacgc gggcggcctc ggcgcggacg gcacctcctg cccgcccgcc S^gaccgat 1320 
cacatcgaca aacccgatcg cagcgagtct gcagatgacg gtacgccggt gtcgatgatc 1380 
ccggtgtcgg cggctcgggc ggcacgcgac gccgccactg cagctgccag cgcccgccag 1440 
cg^gccgci glgatgcgct gcggttggcg cgacgcatcg cggcggcgcc caacgcgtcc 1500 
glcaacaacl cgggcgacta cgggttcttc tggatcaccg cggtgaccac cgacggttcc 1560 
atcgtcgtgg ccaacagcta tgggctggcc tacatacccg acgggatgga attgccgaat 1620 
aaggtgtact tggccagcgc ggatcacgca atcccggttg acgaaattgc acgctgtgcc 1680 
accLcccgg ttttggccgt gcaagcctgg gcggctttcc acgacatgac ^tgcgggcg 1740 
gtgatcggta ccgcggagca gttggccagt tcggatcccg gtgtggccaa gattgtgctg 1800 
glgccagatg acatlccgga gagcggcaaa atgacgggcc ggtcgcggct ^ggtcgtc 1860 
gacccc?cgg cggcggctca gctggccgac actaccgatc agcgtttgct ^ttgttg 1920 
ccgccggcgc cggtggatgt caatccaccg ggcgatgagc ggcacatgct gtggttcgag 1980 
ctgltglagc ccatSccag caccgctacc ggccgcgagg ccgctcatct ^gggcgttc 2040 
cgggcctacg ctgcccactc acaggagatt gccctgcacc aagcgcacac tgcgactgac 2100 
gcSccgtc? agcgtgtggc cgtcgcggac tggctgtact ggcaatacgt caccgggttg 2160 
ctcgaccggg ccctggccgc cgcatgc 



<210> 23 
<211> 345 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> DNA sequence Rv3880c 
<400> 23 



gtgagcatgg acgaattgga cccgcatgtc gcccgggcgt tgacgctggc 99 c 9^" 60 
caatcaaccc tagacgggac gctcaatcag atgaacaacg gatccttccg cgccaccgac 120 
gScSaga ccltcgagt lacgatcaat gggcaccagt ggctcaccgg cctgcgcatc 180 
Salat|g?t tg?tgaagaa gctgggtgcc gaggcggtgg ctcagcgggt ^aacgaggcg 240 
?tgcacaatg cgcaggccgc ggcgtccgcg tataacgacg cggcgggcga gcagctgacc 300 
gctgcgttat cggccatgtc ccgcgcgatg aacgaaggaa tggcc 



<210> 24 
<211> 1380 
<212> DNA 

<213> Mycobacterium tuberculosis 



<220> 

<223> DNA sequence Rv3881c 



WO 03/085098 
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<400> 24 cn 
atgacgcagt cgcagaccgt gacggtggat cagcaagaga ttttgaacag ggccaacgag 60 
gtggaggccc cgatggcgga cccaccgact gatgtcccca tcacaccgtg cgaactcacg 120 
gcggctaaaa acgccgccca acagctggta ttgtccgccg acaacatgcg ggaatacctg 180 
gcggccggtg ccaaagagcg gcagcgtctg gcgacctcgc tgcgcaacgc ggccaaggcg 240 
tatggcgagg ttgatgagga ggctgcgacc gcgctggaca acgacggcga aggaactgtg 300 
caggcagaat cggccggggc cgtcggaggg gacagttcgg ccgaactaac cgatacgccg 360 
agggtggcca cggccggtga acpcaacttc atggatctca aagaagcggc aaggaagctc 420 
gaaacgggcg accaaggcgc atcgctcgcg cactttgcgg atgggtggaa cactttcaac 480 
ctgacgctgc aaggcgacgt caagcggttc cgggggtttg acaactggga aggcgatgcg 540 
gctaccgctt gcgaggcttc gctcgatcaa caacggcaat ggatactcca catggccaaa 600 
ttgagcgctg cgatggccaa gcaggctcaa tatgtcgcgc agctgcacgt gtgggctagg 660 
cgggaacatc cgacttatga agacatagtc gggctcgaac ggctttacgc ggaaaaccct 720 
tcggcccgcg accaaattct cccggtgtac gcggagtatc agcagaggtc ggagaaggtg 780 
ctgaccgaat acaacaacaa ggcagccctg gaaccggtaa acccgccgaa gcctcccccc 840 
gccatcaaga tcgacccgcc cccgcctccg caagagcagg gattgatccc tggcttcctg 900 
atgccgccgt ctgacggctc cggtgtgact cccggtaccg ggatgccagc cgcaccgatg 960 
gttccgccta ccggatcgcc gggtggtggc ctcccggctg acacggcggc gcagctgacg 1020 
tcggctgggc gggaagccgc agcgctgtcg ggcgacgtgg cggtcaaagc ggcatcgctc 1080 
ggtggcggtg gaggcggcgg ggtgccgtcg gcgccgttgg gatccgcgat cgggggcgcc 1140 
gaatcggtgc ggcccgctgg cgctggtgac attgccggct taggccaggg aagggccggc 1200 
ggcggcgccg cgctgggcgg cggtggcatg ggaatgccga tgggtgccgc gcatcaggga 1260 
caagggggcg ccaagtccaa gggttctcag caggaagacg aggcgctcta caccgaggat 1320 
cgggcatgga ccgaggccgt cattggtaac cgtcggcgcc aggacagtaa ggagtcgaag 1380 



<210> 25 
<211> 1386 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> DNA sequence Rv3882c 
<400> 25 



atgagaaatc ctttagggct gcggttcagc accgggcacg ccttgcttgc ctccgcgttg 60 
gccccgccat gcatcatcgc attcttggag acgcgctact ggtgggcggg gattgcgctg 120 
gcctcgttgg gcgtcatcgt ggccacggtc actttctacg gccgccggat caccggctgg 180 
gtggcggcgg ?g?acgcgtg gttgcggcgg cgccgacggc ccccggattc ctcgtcagaa 240 
ccc^tggtcg gggccLcgt gaagccagga gatcacgttg cggtgcgctg a^aaggcgag 300 
tttctggtcg ccgtaatcga gctcattccc cgaccattca cgccgacggt catcgtcgac 3 60 
gggcaagccc aclccgacga catgctggac accggactgg tggaggagct cctgtcggtg 420 
?actgtcccg acttglaggc cgatatcgtc tcagccggct accgcgtcgg caataccgca 480 
gcgccggacg tggtgagtct gtatcagcag gtgatcggga cagacccggc gccggcgaac 540 
cgccgglcc? gStcg?gct gcgcgccgac ccggaacgca cccgcaaatc ^cagcgc 600 
clcga?gaag gcgtcgcagg actggcccgg tatttggtgg cgtccgcgac gcgcattgcc 660 
gatcgactgl ctlgcSatgg tgtcgacgcg gtgtgtggcc gcagcttcga tgactacgac 720 
cacgccaclg acatcggctt tgtgcgggag aaatggtcga tgatcaaggg gcgcga^P 780 
tacactgccg cctacgcggc gcccggaggt ccggatgtat ggtggtcggc acgcgcggac 840 
cacaccltcl ccagagtccg ggtcgcgccg gggatggccc cgcagtccac 99tgttgctg 900 
accacggcgg acaagcccaa gacacccagg ggcttcgccc gcctatttgg C99gcagcgg 960 
cccgcgctgc aaggccagca tctggtggcc aaccgccact gccagctgcc gatcgggtca 1020 
gctlgggtac tg|tcggcga gacggtgaac cgatgcccgg tctacatgcc cttcgacgat 1080 
gtcgacatcg ccctcaacct gggtgacgct cagacattca cccagttcgt ggtgcgtgcg 1140 
Icglcggcag gtgcgatggt cacagtcggg ccacagttcg aggaatttgc ccggttgatc 1200 
|g?|cSacI ?cgggcagga ggtaaaggtg gcgtggccga atgcgacgac ctatctcggc 1260 
ccgcatcccg gtattgaccg ggtgattctg cggcacaatg tgatcggtac cccgcggcat 1320 
cggcagctgc cgattcgccg ggtttcccca cccgaggaaa gccgctacca gatggcgctg 1380 



ccgaag 
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26/66 
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<210> 26 
<211> 1338 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> DNA sequence Rv3883c 



gtgcaccgta tctttctgat cacggtggcg ctggcgttgc tcaccgcgtc gcccgcatcg 60 
gccatcacgc caccgccgat cgatccgggc gcgttgccgc ccgacgtgac gggcccggat 120 
cagcctaccg aacagcgcgt tttgtgcgcg tcgcccacca cgctgccggg gtccgggttc 180 
cacgatccgc cgtggagcaa cacgtatctg ggcgtggccg atgcccacaa gttcgcgacc 240 
ggggccgggg tgacggtggc ggtgatcgac accggtgtcg acgcttcgcc acgggtcccg 300 
gcggaaccS gcggcgattt cgtcgaccag gccggtaacg gcctgtctga ctgtgatgcc 360 
catgggactc tcacagcatc catcatcgcg ggccggcccg cgcccaccga cgggttcgtc 420 
ggcgtcgcgc ccgacgctcg actgctctcg ctacgtcaga cgtctgaggc cttcgaaccg 480 
gtcggctcac aagccaaccc gaatgacccc aacgccaccc cggccgccgg ttccatccgc 540 
agtcttgccc gcgccgtggt gcacgccgcc aacctcggcg tgggtgtgat caacatcagt 600 
gaagccgcct gctacaaggt gagcaggccg atcgatgaaa cctcactggg tgcatccatc 660 
gactatgcgg tcaacgtcaa aggcgtggtg gtggtggtcg cggccggcaa caccggtggc 720 
gattgcgtac agaatccggc gccggacccg tccacacccg gcgacccacg cggctggaac 780 
aatgtgcaga ccgttgtcac cccggcgtgg tacgcaccgc tggtgttaag cgtcggcggt 840 
atcggccaga ccgggatgcc cagctcgttc tcgatgcacg gaccgtgggt ggacgtggcc 900 
gcgcccgcag aaaacatcgt cgcgctcggc gacaccggtg aaccggtgaa tgcgctgcaa 960 
ggccgggagg ggccggtacc catcgccggc acctcgtttg ccgcggcata tgtgtcgggt 1020 
SggSgccc ?gct?cggca gcggttcccc gacctgacgc cggcgcagat catccaccgg 1080 
atcaccgcca ccgcgagaca ccccgggggc ggggtcgacg acctggtcgg cgccggcgtc 1140 
atcgatgcgg tggccgcgct gacgtgggac attccgcccg gccctgcttc ggcgccatac 1200 
aacgtcaglc gacttccacc cccggtggtg gagccgggtc ccgatcgtcg cccgattacg 1260 
gctgtggcgt tggtggccgt cggccttacg ttggccctgg gcctgggcgc gctggctaga 1320 



cgggcgctga gccgccga 



<210> 27 
<211> 1857 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> DNA sequence Rv3 884c 



<400> 27 

atgtcgagaa tggtggacac gatgggtgat 
gcgatgacga tcaagaatgg ccagggatgc 
accgaggccg atccgtcgat ggccgacgcg 
gatctggcct cgcttaagca gctcaacgcc 
cggatcggcc ggacgttggc cgctgaggtc 
accgacgcat ctcaggtggg gctggcgctg 
gcgaaggccg atgccctgtt agcaaaccgc 
tggcatcagc tggctcgggc gttcctgatg 
tcgacggccg ccgaggatct gccgccacag 
atttgtgcgc tggcagccca cgccgccgcc 
tggctggacc gggtggacgt gatcggacac 
gtgctcaccg cggcgatcgg accggccgat 
gtgcggggga tggtgtaccg gcaactgcat 
aaggccacca tcaacggggt gctcaccgac 
ctgcgcttga ttgttaccga tgaacgaacc 
tcgacggcga aaagccgcga ccagctcgat 
ctgctagccg agggccggga actgctggcc 
gcggtatcgg cgctggaaga ccaactcgag 



ttactcactg cgcgccggca tttcgatcgg 60 
gtggcggcgt tgcctgagtt tgtggctgcc 120 
tggctgggtc gtatcgcctg cggtgaccgc 180 
catagcgagt ggctgcaccg cgagaccacg 240 
cagctgggac catccatcgg gatcacggtg 300 
tcgtcggcgt tgacgatcgc gggggagtat 360 
gagctattgg attcgtggcg caactaccag 420 
tacgtcacgc agcgatggcc cgacgtgttg 480 
gcgatcgtca tgccggcggt gaccgcgtcg 540 
catctcgggc aggggcgagt ggccctggac. 600 
agcaggtcat cggagcggtt cggcgccgac 660 
attccgctgc tggtcgccga cttggcgtat 720 
gaggaggaca aggcccagat ctggctgtcg 780 
gccgccaaag aagccctggc ggacccgaac 840 
atcgccagcc gctccgaccg ttgggatgct 900 
gacgacaatg cagcgcagcg gcgcggcgag 960 
aaacaggtgg gcctggcggc ggtcaagcaa 102 0 
gtgcgcatga tgcgcctaga gcacggccta 1080 
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ccggtggagg ggcagaccaa ccacatgttg 
accaccgctg aagcgctcgg caagatctac 
attcgagaag ttcgccgatc ggacttctgt 
acgaacgagc tgatcgaaaa gtcactcggg 
ctgatcgaac gtcatcaaga cggaacaccg 
ctcctggttc aattggaaac acaccgattc 
gatcaggtgg atgaattcct caccgtgaac 
ctgcggttcg agtcttattc gccggtggag 
ccgcgcgcca gccagctcga tgacgccgca 
atccgtaact acaccacccc tagtgggcag 
ttcgcccgca acgtgatcga acgcgccgaa 
aaacgtgcgg gccaaccggt atcggttcag 
gatgccgcga tacgcagcgt gtgctcagac 



ctggtggggc caccaggcac aggtaagaca 1140 
gccggcatgg ggatcgtgcg tcaccccgaa 1200 
gggcactaca tcggggagtc aggacccaag 1260 
cgaatcattt tcatggacga gttctactcg 1320 
gacatgatcg gcatggaggc ggtcaatcaa 1380 
gacttctgtt tcatcggggc cggctatgag 1440 
ccgggtttgg ctggccggtt caaccgaaag 1500 
atcgtcgaga ttggacaccg ctacgctaca 1560 
cgggaggtat tcctcgacgc ggtcaccacc 1620 
cacggtatcg acgctatgca aaacggtcgg 1680 
gggttccggg acacccgggt ggttgcgcaa 1740 
gatctgcaga tcatcaccgc caccgacatc 1800 
aaccgagaca tggccgcgat cgtttgg 1857 



<210> 28 
<211> 1611 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> DNA sequence Rv3885c 



<400> 28 
ttgacgtcca 
acagtgttcg 
gctgtcgtgg 
tgggcggtac 
gccaacaacc 
caacttctcg 
gacaacgtga 
ctcgattcaa 
cgggtgtatg 
atcatgcggc 
gccgctgcca 
gccaaactgg 
gtagccggga 
tatgcgtacc 
gccgatgagg 
accgtgcgca 
ggtgagcaag 
cagcggcgct 
attggcaagc 
tcgcgcgtct 
ggtgccggtg 
atgccgcagt 
gtggagtacg 
gatgtcgcga 
acctcgctgt 
gcaacggtga 
gcggagaacc 



agctgaccgg 
tgctcgcgtc 
tcggcgtcgc 
tggggctgcg 
gatccggggg 
gccgagcgca 
ttgacgtcgt 
tcagcgtcgt 
acgcggagat 
ttccggtgat 
tttcggtcgc 
ccaccgcaac 
gtgcgcagcg 
cggctgaggc 
tcatccagaa 
caccgacgcc 
ccgccgcggc 
gcccgttgcc 
tgagcaacgg 
tcgtggccgc 
agcgggtgtg 
tgagcatcgt 
tgcggcggcg 
tttcgcccac 
ccgagagcga 
aagtcggtgc 
gctatgtcag 



gttcagcccg 
ggcgggctgg 
cttggtgttc 
gggtcggcgt 
tggcgtccgc 
ccgggcgact 
tgagctcgcg 
caccttcggc 
cggtacgccg 
cggcaacacc 
ccaacgcgtt 
agacttggct 
ctggaaagct 
gatttcgtcg 
cgtaacggtg 
ggcgcctacc 
tgcggccaac 
ggcgcagcta 
ggaccggctg 
ggacgacacg 
tgtgcacact 
cggcacacca 
caagaacggc 
gccacggcca 
tcggcatggc 
ggcaggacaa 
ccttgagccg 



cgcagtgcga 
gcgctgggcg 
gtacagtggt 
cccgtcaaat 
gtgcaagacg 
acggtcaccg 
ccgttgctgc 
tcgcgaaccg 
ccgtatgccg 
caagctttac 
gccagctccc 
gagcttgatc 
atccgcggtg 
cgggttctct 
tatccggacg 
ccgcccagtg 
atgtgcgggc 
gtcaccgaga 
atgattcccg 
atcgccaaga 
cgcgaccaag 
cggcccgcgc 
gatgacggca 
gccagtgtca 
ttcgaggtga 
aactggctgg 
gtcacgatgt 



ggcgggtcgc 
gccagctagg 

ggggtcagcc 

ggaatgaccc 
gtgtcgcggt 
ggtcggtgac 
gccacccgct 
gcaccgtcgg 
ggcggcgcga 
gctggcgtac 
tgcgctgtca 
gccggctggg 
aagccgggtg 
cgcaagcctg 
cgacgtgcac 
tgatcttgcg 
cacgtccaca 
tcggaccgtc 
ttaccgacgc 
ggatcgtgat 
agcgttgggc 
cgcgcaccac 
aatctgaagg 
tcaccattgc 
ccatcgaaca 
ttgagatgga 
cgataggccg 



cggggtgtgg 
tgcggtcatg 
ggcgtggtcg 
aattaccttg 
ggtggcggtg 
cgtagaaagc 
ggacctggaa 
cgattacccg 
aacgtggctg 
cagcgttggg 
gggcttgcgc 
gtcggacgcg 
gatgacgacg 
gacgctgcgt 
cgcgaccatc 
tcggctcaat 
cctacgcgga 

gggggtgttg 

cggtgagctg 
tcgcgtcgtc 
cagcgtacgc 
tgtcggcgtc 
cagcggtgtc 
ccgacccggt 
aatcgatcgg 
aatgttccgt 

g 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1611 



<210> 29 
<211> 620 
<212> DNA 

<213> Mycobacterium tuberculosis 



<220> 

<223> CFP-10 + ESAT-6 



WO 03/085098 PCT/IB03/01789 

28/66 



atggcagaga tgaagaccga tgccgctacc ctcgcgcagg aggcaggtaa t"<W°99 
atctccggcg acctgaaaac ccagatcgac caggtggagt cgacggcagg ttcgttgcag 
ggccagtggc gcggcgcggc ggggacggcc gcccaggccg cggtggtgcg ^tccaagaa 
gcagccaata agcagaagca ggaactcgac gagatctcga cgaatattcg tcaggccggc 
gtccaatact cgagggccga cgaggagcag cagcaggcgc tgtoctcgca aatgggcttc 
tgacccgcta atacgaaaag aaacggagca aaaacatgac agagcagcag tggaatttcg 
cgggtatcga ggccgcggca agcgcaatcc agggaaatgt cacgtccatt ^^tcc 
ttaacgaggg gaagcagtcc ctgaccaagc tcgcagcggc ctggggcggt agcggttcgg 
aggcgtacca gggtgtccag caaaaatggg acgccacggc taccgagctg aacaacgcgc 
tgcagaacct ggcgcggacg atcagcgaag ccggtcaggc aatggcttcg accgaaggca 



<210> 30 
<211> 21 
<212> DNA 

<213> Artificial Sequence 

<223> Description of Artificial Sequence: Primer SP6-BAC1 

<400> 30 21 
agttagctca ctcattaggc a 

<210> 31 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer T7-BAC1 

<400> 31 .21 
ggatgtgctg caaggcgatt a 

<210> 32 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer esat-6F 

<400> 32 20 
gtcacgtcca ttcattccct 

<210> 33 
<211> 19 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer esat-6R 



<400> 33 

atcccagtga cgttgcctt 



19 



WO 03/085098 



29/66 



<210> 34 
<211> 20 
<212> DNA 

<213> Artificial Sequence 

<220> ^ 
<223> Description of Artificial Sequence: Primer RD1 
flanking region F 

<400> 34 

gcagtgcaaa ggtgcagata 



<210> 35 
<211> 20 
<212> DNA 

<213> Artificial Sequence 

<220> . 
<223> Description of Artificial Sequence: Primer RD1 
flanking region R 

<400> 35 

gattgagaca cttgccacga 



<210> 36 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
plcA.int.F 

<400> 36 

caagttgggt ctggtcgaat 



<210> 37 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
plcA.int .R 

<400> 37 

gctacccaag gtctcctggt 



<210> 38 
<211> 153 
<212> DNA 

<213> Mycobacterium tuberculosis 



<220> 

<223> Sequences at the junction RD1 1 
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<400> 38 cn 
caagacgagg ttgtaaaacc tcgacgcagg atcggcgatg aaatgccagt cggcgtcgct 60 
gagcgcgcgc tgcgccgagt cccattttgt cgctgatttg tttgaacagc gacgaaccgg 120 
tgttgaaaat gtcgcctggg tcggggattc cct 

<210> 39 
<211> 19 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer RD5 
flanking region F 

<400> 39 19 
gaatgccgac gtcatatcg 

<210> 40 
<211> 20 
<212> DNA 

<213> Artificial Sequence 

<220> . mc *&c 

<223> Description of Artificial Sequence: Primer RD5 
flanking region R 

<400> 40 2Q 
cggccactga gttcgattat 

<210> 41 
<211> 152 
<212> DNA 

<213> Mycobacterium tuberculosis 

<220> mic 

<223> Sequence at the junction RD5 

cctcgatgaa ccacctgaca tgaccccatc ctttccaaga actggagtct ccggacatgc 60 
cggggcggtt cactgcccca ggtgtcctgg gtcgttccgt tgaccgtcga gtccgaacat 120 
ccgtcattcc cggtggcagt cggtgcggtg ac 

<210> 42 
<211> 20 
<212> DNA 

<213> Artificial Sequence 

<220> . vi'^i 

<223> Description of Artificial Sequence: Primer MiDl 

flanking region F 

<400> 42 20 
cagccaacac caagtagacg 



<210> 43 
<211> 20 
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<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer MiDl 
flanking region R 

<400> 43 

tctacctgca gtcgcttgtg 



<210> 44 
<211> 123 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> Sequence at the junction MiDl 
<400> 44 

cacctgacat gaccccatcc tttccaagaa ctggagtctc cggacatgcc ggggcggttc 
agggacattc atgtccatct tctggcagat cagcagatcg cttgttctca gtgcaggtga 
gtc 



<210> 45 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer MiD2 
flanking region R 

<400> 45 

gtccatcgag gatgtcgagt 



<210> 46 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer MiD2 
flanking region L 

<400> 46 

ctaggccatt ccgttgtctg 



<210> 47 
<211> 151 
<212> DNA 

<213> Mycobacterium tuberculosis 
<220> 

<223> Sequence at the junction MiD2 



<400> 47 

gctgcctact acgctcaacg ccagagacca gccgccggct gaggtctcag atcagagagt 



60 
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ctccggactc accggggcgg ttcataaagg cttcgagacc ggacgggctg taggttcctc 120 
aactgtgtgg cggatggtct gagcacttaa c 

<210> 48 
<211> 15 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer MiD3 
flanking region R 

<400> 48 15 
ggcgacgcca tttcc 

<210> 49 
<211> 19 
<212> DNA 

<213> Artificial Sequence 

<220> . M -T^-» 

<223> Description of Artificial Sequence: Primer MiD3 

flanking region L 

<400> 49 19 
aactgtcggg cttgctctt 

<210> 50 
<211> 181 
<212> DNA 

<213> Mycobacterium tuberculosis 



<220> 

<223> Sequence at the junction MiD3 



tggcgccggc acctccgttg ccaccgttgc cgccgctggt gggcgcggtg ^cgttcgccc 
cSSgaacc gttcagggcc gggttcgccc tcagccgcta aacacgccga ccaagatcaa 
clagctacct icccggtcaa ggttgaagag cccccatatc agcaagggcc cggtgtcggc 



<210> 51 
<211> 108 
<212> PRT 

<213> Mycobacterium tuberculosis 
<220> 

<223> RV3861 - hypothetical protein 
<400> 51 

m x nif. Ron T3-rr» Val GlV 1 

10 15 



Val Thr Trp Leu Ala Asp Pro Val Gly Asn Ser Arg He Ala Arg Ala 
! 5 10 15 

Gin Ala Cys Lys Thr Ser He Ser Ala Pro He Val Glu Ser Trp Arg 

2 5 



20 



Ala Gin Arg Gly Ala Gin Cys Gly Gin Arg Glu Lys Ser Cys Arg Cys 
35 40 45 
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ser Arg Ala Val His He Gin Gly He Ser Pro Pro Leu Phe Arg Arg 
50 55 60 

Pro Leu Glu Pro Ala Val Gin Ala Ala Val Ala Ser Cys Arg Leu Gly 



65 



70 75 80 



Arg His Pro Val Val Ala His Arg Val Thr Val Ala Leu Gly Gin Gly 
85 90 

Ser Gin Leu Ala Gin Arg Glu Cys Pro Arg Pro Ala 
100 105 



<210> 52 
<211> 116 
<212> PRT 

<213> Mycobacterium tuberculosis 
<220> 

<223> WHIB6 - Possible transcriptional regulatory 
protein WHIB-like WHIB6 



Merir^Tyr Ala Phe Ala Ala Glu Ala Thr Thr Cys Asn Ala Phe Trp 
1 



5 10 15 



Arg Asn Val Asp Met Thr Val Thr Ala Leu Tyr Glu Val Pro Leu Gly 
20 25 30 

Val Cys Thr Gin Asp Pro Asp Arg Trp Thr Thr Thr Pro Asp Asp Glu 
35 40 45 

Ala Lys Thr Leu Cys Arg Ala Cys Pro Arg Arg Trp Leu Cys Ala Arg 
50 



55 60 



Asp Ala Val Glu Ser Ala Gly Ala Glu Gly Leu Trp Ala Gly Val Val 
65 70 75 80 

lie Pro Glu Ser Gly Arg Ala Arg Ala Phe Ala Leu Gly Gin Leu Arg 
85 90 95 

Ser Leu Ala Glu Arg Asn Gly Tyr Pro Val Arg Asp His Arg Val Ser 
100 105 HO 

Ala Gin Ser Ala 
115 



<210> 53 
<211> 392 
<212> PRT 

<213> Mycobacterium tuberculosis 

^223> Rv3863 - hypothetical alanine rich protein 

<400> 53 
Met 



Ala Gly Glu Arg Lys Val Cys Pro Pro Ser Arg Leu Val Pro Ala 



10 15 
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Asn Lys Gly Ser Thr Gin Met Ser Lys Ala Gly Ser Thr Val Gly Pro 
20 25 30 

Ala Pro Leu Val Ala Cys Ser Gly Gly Thr Ser Asp Val He Glu Pro 
35 40 45 

Arg Arg Gly Val Ala He He Gly His Ser Cys Arg Val Gly Thr Gin 
50 55 60 

lie Asp Asp Ser Arg He Ser Gin Thr His Leu Arg Ala Val Ser Asp 
65 70 75 80 

Asp Gly Arg Trp Arg He Val Gly Asn He Pro Arg Gly Met Phe Val 
35 90 95 

Gly Gly Arg Arg Gly Ser Ser Val Thr Val Ser Asp Lys Thr Leu He 
100 105 110 

Arg Phe Gly Asp Pro Pro Gly Gly Lys Ala Leu Thr Phe Glu Val Val 
115 120 125 

Arg Pro Ser Asp Ser Ala Ala Gin His Gly Arg Val Gin Pro Ser Ala 
130 135 140 



Asp Leu Ser Asp Asp Pro Ala His Asn Ala Ala Pro Val Ala Pro Asp 
145 150 155 160 

Pro Gly Val Val Arg Ala Gly Ala Ala Ala Ala Ala Arg Arg Arg Glu 



165 



170 175 



Leu Asp He Ser Gin Arg Ser Leu Ala Ala Asp Gly He He Asn Ala 
180 185 190 

Gly Ala Leu He Ala Phe Glu Lys Gly Arg Ser Trp Pro Arg Glu Arg 
195 200 205 

Thr Arg Ala Lys Leu Glu Glu Val Leu Gin Trp Pro Ala Gly Thr He 
210 215 220 

Ala Arg He Arg Arg Gly Glu Pro Thr Glu Pro Ala Thr Asn Pro Asp 
225 230 235 240 

Ala Ser Pro Gly Leu Arg Pro Ala Asp Gly Pro Ala Ser Leu He Ala 
245 250 255 

Gin Ala Val Thr Ala Ala Val Asp Gly Cys Ser Leu Ala He Ala Ala 
260 265 270 

Leu Pro Ala Thr Glu Asp Pro Glu Phe Thr Glu Arg Ala Ala Pro He 
275 280 285 

Leu Ala Asp Leu Arg Gin Leu Glu Ala He Ala Val Gin Ala Thr Arg 
290 " 295 300 

He Ser Arg He Thr Pro Glu Leu He Lys Ala Leu Gly Ala Val Arg 
305 ~ 310 315 320. 

Arq His His Asp Glu Leu Met Arg Leu Gly Ala Thr Ala Pro Gly Ala 
325 330 335 

Thr Leu Ala Gin Arg Leu Tyr Ala Ala Arg Arg Arg Ala Asn Leu Ser 



WO 03/085098 PCT/IB03/01789 

35/66 



340 



345 350 



Thr Leu Glu Thr Ala Gin Ala Ala Gly Val Ala Glu Glu Met lie Val 
355 360 365 

Gly Ala Glu Ala Glu Glu Glu Leu Pro Ala Glu Ala Thr Glu Ala He 
370 375 380 

Glu Ala Leu He Arg Gin He Asn 
385 390 



<210> 54 
<211> 402 
<212> PRT 

<213> Mycobacterium tuberculosis 
<220> 

<223> RV3864 - conserved hypothetical protein 
<400> 54 

Met Ala Ser Gly Ser Gly Leu Cys Lys Thr Thr Ser Asn Phe He Trp 
1 ' 5 10 15 

Gly Gin Leu Leu Leu Leu Gly Glu Gly He Pro Asp Pro Gly Asp He 
20 25 30 

Phe Asn Thr Gly Ser Ser Leu Phe Lys Gin He Ser Asp Lys Met Gly 
35 40 45 

Leu Ala He Pro Gly Thr Asn Trp He Gly Gin Ala Ala Glu Ala Tyr 
50 55 60 

Leu Asn Gin Asn He Ala Gin Gin Leu Arg Ala Gin Val Met Gly Asp 
65 70 75 80 

Leu Asp Lys Leu Thr Gly Asn Met He Ser Asn Gin Ala Lys Tyr Val 
85 90 95 

Ser Asp Thr Arg Asp Val Leu Arg Ala Met Lys Lys Met He Asp Gly 
100 105 HO 

Val Tyr Lys . Val Cys Lys Gly Leu Glu Lys He Pro Leu Leu Gly His 
115 * * 120 125 

Leu Trp Ser Trp Glu Leu Ala He Pro Met Ser Gly He Ala Met Ala 
130 135 I 40 

Val Val Gly Gly Ala Leu Leu Tyr Leu Thr He Met Thr Leu Met Asn 
145 150 155 I 60 

Ala Thr Asn Leu Arg Gly He Leu Gly Arg Leu He Glu Met Leu Thr 
165 170 I 75 

Thr Leu Pro Lys Phe Pro Gly Leu Pro Gly Leu Pro Ser Leu Pro Asp 
180 185 190 

He He Asp Gly Leu Trp Pro Pro Lys Leu Pro Asp He Pro He Pro 
195 200 205 



Gly Leu Pro Asp 



He Pro Gly Leu Pro Asp Phe Lys Trp Pro Pro Thr 
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210 



215 220 



Pro Gly Ser Pro Leu Phe Pro Asp Leu Pro Ser Phe Pro Gly Phe Pro 
225 230 235 240 

Glv Phe Pro Glu Phe Pro Ala lie Pro Gly Phe Pro Ala Leu Pro Gly 
245 250 255 

Leu Pro Ser He Pro Asn Leu Phe Pro Gly Leu Pro Gly Leu Gly Asp 
260 265 270 

Leu Leu Pro Gly Val Gly Asp Leu Gly Lys Leu Pro Thr Trp Thr Glu 
275 280 285 

Leu Ala Ala Leu Pro Asp Phe Leu Gly Gly Phe Ala Gly Leu Pro Ser 
290 295 300 

Leu Gly Phe Gly Asn Leu Leu Ser Phe Ala Ser Leu Pro Thr Val Gly 
305 310 315 320 

Gin Val Thr Ala Thr Met Gly Gin Leu Gin Gin Leu Val Ala Ala Gly 
325 330 335 

Glv Gly Pro Ser Gin Leu Ala Ser Met Gly Ser Gin Gin Ala Gin Leu 
340 345 350 

He Ser Ser Gin Ala Gin Gin Gly Gly Gin Gin His Ala Thr Leu Val 
355 360 365 

Ser Asp Lys Lys Glu Asp Glu Glu Gly Val Ala Glu Ala Glu Arg Ala 
370 375 380 

Pro He Asp Ala Gly Thr Ala Ala Ser Gin Arg Gly Gin Glu Gly Thr 
385 390 395 400 

Val Leu 



<210> 55 
<211> 103 
<212> PRT 

<213> Mycobacterium tuberculosis 
<220> 

<223> RV3865 - conserved hypothetical protein 

Met°Thr 5 Gly Phe Leu Gly Val Val Pro Ser Phe Leu Lys Val Leu Ala 
1 s 10 15 

Gly Met His Asn Glu He Val Gly Asp He Lys Arg Ala Thr Asp Thr 
20 25 30 

Val Ala Gly He Ser Gly Arg Val Gin Leu Thr His Gly Ser Phe Thr 
35 40 45 

Ser Lys Phe Asn Asp Thr Leu Gin Glu Phe Glu Thr Thr Arg Ser Ser 
50 55 60 

Thr Gly Thr Gly Leu Gin Gly Val Thr Ser Gly Leu Ala Asn Asn Leu 
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65 70 



75 80 



Leu Ala Ala Ala Gly Ala Tyr Leu Lys Ala Asp Asp Gly Leu Ala Gly 
85 90 95 

Val lie Asp Lys lie Phe Gly 
100 



<210> 56 
<211> 283 
<212> PRT 

<213> Mycobacterium tuberculosis 
<220> 

<223> Rv3866 - conserved hypothetical protein 
<400> 56 

Met Thr Gly Pro Ser Ala Ala Gly Arg Ala Gly Thr Ala Asp Asn Val 
15 10 15 

val Glv Val Glu Val Thr He Asp Gly Met Leu Val He Ala Asp Arg 
20 25 30 

Leu His Leu Val Asp Phe Pro Val Thr Leu Gly He Arg Pro Asn He 
35 40 45 

Pro Gin Glu Asp Leu Arg Asp He Val Trp Glu Gin Val Gin Arg Asp 
50 55 60 

Leu Thr Ala Gin Gly Val Leu Asp Leu His Gly Glu Pro Gin Pro Thr 
65 70 75 80 

Val Ala Glu Met Val Glu Thr Leu Gly Arg Pro Asp Arg Thr Leu Glu 
85 90 95 

Gly Arg Trp Trp Arg Arg Asp He Gly Gly Val Met Val Arg Phe Val 
100 105 HO 

Val Cys Arg Arg Gly Asp Arg His Val He Ala Ala Arg Asp Gly Asp 
115 120 125 

Met Leu Val Leu Gin Leu Val Ala Pro Gin Val Gly Leu Ala Gly Met 
130 135 140 



Val Thr Ala Val Leu Gly Pro Ala Glu Pro Ala Asn Val Glu Pro Leu 
145 



150 155 I 60 



Thr Gly Val Ala Thr Glu Leu Ala Glu Cys Thr Thr Ala Ser Gin Leu 
165 170 175 

Thr Gin Tyr Gly He Ala Pro Ala Ser Ala Arg Val Tyr Ala Glu He 
180 185 i90 

Val Gly Asn Pro Thr Gly Trp Val Glu He Val Ala Ser Gin Arg His 
195 200 205 

Pro Gly Gly Thr Thr Thr Gin Thr Asp Ala Ala Ala Gly Val Leu Asp 
210 215 220 

Ser Lys Leu Gly Arg Leu Val Ser Leu Pro Arg Arg Val Gly Gly Asp 



225 
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230 235 240 



Leu Tyr Gly Ser Phe Leu Pro Gly Thr Gin Gin Asn Leu Glu Arg Ala 
245 250 255 

Leu Asp Gly Leu Leu Glu Leu Leu Pro Ala Gly Ala Trp Leu Asp His 
260 265 270 

Thr Ser Asp His Ala Gin Ala Ser Ser Arg Gly 
275 280 



<210> 57 
<211> 183 
<212> PRT 

<213> mycobacterium tuberculosis 
<220> 

<223> Protein sequence Rv3867 
<400> 57 

Met Val Asp Pro Pro Gly Asn Asp Asp Asp His Gly Asp Leu Asp Ala 
x 5 10 15 

Leu Asp Phe Ser Ala Ala His Thr Asn Glu Ala Ser Pro Leu Asp Ala 
20 25 30 

Leu Asp Asp Tyr Ala Pro Val Gin Thr Asp Asp Ala Glu Gly Asp Leu 
35 40 45 

Asp Ala Leu His Ala Leu Thr Glu Arg Asp Glu Glu Pro Glu Leu Glu 
50 55 60 

Leu Phe Thr Val Thr Asn Pro Gin Gly Ser Val Ser Val Ser Thr Leu 
65 70 75 80 

Met Asp Gly Arg He Gin His Val Glu Leu Thr Asp Lys Ala Thr Ser 
85 90 95 

Met Ser Glu Ala Gin Leu Ala Asp Glu He Phe Val He Ala Asp Leu 
100 105 HO 

Ala Arg Gin Lys Ala Arg Ala Ser Gin Tyr Thr Phe Met Val Glu Asn 
115 120 125 

He Gly Glu Leu Thr Asp Glu Asp Ala Glu Gly Ser Ala Leu Leu Arg 
130 135 140 

Glu Phe Val Gly Met Thr Leu Asn Leu Pro Thr Pro Glu Glu Ala Ala 
145 150 155 160 

Ala Ala Glu Ala Glu Val Phe Ala Thr Arg Tyr Asp Val Asp Tyr Thr 
165 170 175 

Ser Arg Tyr Lys Ala Asp Asp 
180 



<210> 58 
<211> 573 
<212> PRT 
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<213> mycobacterium tuberculosis 
<220> 

<223> Protein sequence Rv3868 
<400> 58 

Met Thr Asp Arg Leu Ala Ser Leu Phe Glu Ser Ala Val Ser Met Leu 
15 10 15 

Pro Met Ser Glu Ala Arg Ser Leu Asp Leu Phe Thr Glu lie Thr Asn 
20 25 30 

Tyr Asp Glu Ser Ala Cys Asp Ala Trp lie Gly Arg He Arg Cys Gly 
35 40 45 

Asp Thr Asp Arg Val Thr Leu Phe Arg Ala Trp Tyr Ser Arg Arg Asn 
50 ~ 55 60 

Phe Gly Gin Leu Ser Gly Ser Val Gin He Ser Met Ser Thr Leu Asn 
65 70 75 80 

Ala Arg He Ala He Gly Gly Leu Tyr Gly Asp He Thr Tyr Pro Val 
85 90 95 

Thr Ser Pro Leu Ala He Thr Met Gly Phe Ala Ala Cys Glu Ala Ala 
100 105 HO 

Gin Gly Asn Tyr Ala Asp Ala Met Glu Ala Leu Glu Ala Ala Pro Val 
115 120 125 

Ala Gly Ser Glu His Leu Val Ala Trp Met Lys Ala Val Val Tyr Gly 
130 135 140 

Ala Ala Glu Arg Trp Thr Asp Val He Asp Gin val Lys Ser Ala Gly 
145 150 155 160 

Lys Trp Pro Asp Lys Phe Leu Ala Gly Ala Ala Gly Val Ala His Gly 
165 170 17 

Val Ala Ala Ala Asn Leu Ala Leu Phe Thr Glu Ala Glu Arg Arg Leu 
180 185 190 

Thr Glu Ala Asn Asp Ser Pro Ala Gly Glu Ala Cys Ala Arg Ala He 
195 200 205 

Ala Trp Tyr Leu Ala Met Ala Arg Arg Ser Gin Gly Asn Glu Ser Ala 
210 215 220 



Ala Val Ala Leu Leu Glu Trp Leu Gin Thr Thr His Pro Glu Pro Lys 
225 230 235 240 

Val Ala Ala Ala Leu Lys Asp Pro Ser Tyr Arg Leu Lys Thr Thr. Thr 
.245 250 255 

Ala Glu Gin He Ala Ser Arg Ala Asp Pro Trp Asp Pro Gly Ser Val 
260 265 270 

Val Thr Asp Asn Ser Gly Arg Glu Arg Leu Leu Ala Glu Ala Gin Ala 
275 280 285 

Glu Leu Asp Arg Gin He Gly Leu Thr Arg Val Lys Asn Gin He Glu 
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290 



295 300 



Arg Tyr Arg Ala Ala Thr Leu Met Ala Arg Val Arg Ala Ala Lys Gly 
305 310 315 320 

Met Lys Val Ala Gin Pro Ser Lys His Met lie Phe Thr Gly Pro Pro 
325 330 335 

Gly Thr Gly Lys Thr Thr lie Ala Arg Val Val Ala Asn lie Leu Ala 
340 345 350 

Gly Leu Gly Val He Ala Glu Pro Lys Leu Val Glu Thr Ser Arg Lys 
355 360 365 

Asp Phe Val Ala Glu Tyr Glu Gly Gin Ser Ala Val Lys Thr Ala Lys 
370 375 380 

Thr He Asp Gin Ala Leu Gly Gly Val Leu Phe He Asp Glu Ala Tyr 
385 390 395 400 

Ala Leu Val Gin Glu Arg Asp Gly Arg Thr Asp Pro Phe Gly Gin Glu 
405 410 415 

Ala Leu Asp Thr Leu Leu Ala Arg Met Glu Asn Asp Arg Asp Arg Leu 
420 425 430 

Val Val He He Ala Gly Tyr Ser Ser Asp He Asp Arg Leu Leu Glu 
435 440 445 

Thr Asn Glu Gly Leu Arg Ser Arg Phe Ala Thr Arg He Glu Phe Asp 
450 455 460 

Thr Tyr Ser Pro Glu Glu Leu Leu Glu He Ala Asn Val He Ala Ala 
465 470 475 480 

Ala Asp Asp Ser Ala Leu Thr Ala Glu Ala Ala Glu Asn Phe Leu Gin 
485 490 

Ala Ala Lys Gin Leu Glu Gin Arg Met Leu Arg Gly Arg Arg Ala Leu 



500 



505 510 



Asp Val Ala Gly Asn Gly Arg Tyr Ala Arg Gin Leu Val Glu Ala Ser 
515 * 520 525 

Glu Gin Cys Arg Asp Met Arg Leu Ala Gin Val Leu Asp He Asp Thr 
530 535 540 

Leu Asp Glu Asp Arg Leu Arg Glu He Asn Gly Ser Asp Met Ala Glu 
545 550 555 560 

Ala He Ala Ala Val His Ala His Leu Asn Met Arg Glu 
565 570 



<210> 59 
<211> 480 
<212> PRT 

<213> mycobacterium tuberculosis 



<220> 

<223> Protein sequence Rv3869 
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<400> 59 

Met Gly Leu Arg Leu Thr Thr Lys Val Gin Val Ser Gly Trp Arg Phe 
1 5 10 15 

Leu Leu Arg Arg Leu Glu His Ala lie Val Arg Arg Asp Thr Arg Met 
20 25 30 

Phe Asp Asp Pro Leu Gin Phe Tyr Ser Arg Ser lie Ala Leu Gly lie 
35 40 45 

Val Val Ala Val Leu He Leu Ala Gly Ala Ala Leu Leu Ala Tyr Phe 
50 55 60 

Lys Pro Gin Gly Lys Leu Gly Gly Thr Ser Leu Phe Thr Asp Arg Ala 
65 70 75 80 

Thr Asn Gin Leu Tyr Val Leu Leu Ser Gly Gin Leu His Pro Val Tyr 
85 90 95 

Asn Leu Thr Ser Ala Arg Leu Val Leu Gly Asn Pro Ala Asn Pro Ala 
100 105 HO 

Thr Val Lys Ser Ser Glu Leu Ser Lys Leu Pro Met Gly Gin Thr Val 
115 120 125 

Gly He Pro Gly Ala Pro Tyr Ala Thr Pro Val Ser Ala Gly Ser Thr 
. 130 ' 135 140 

Ser He Trp Thr Leu Cys Asp Thr Val Ala Arg Ala Asp Ser Thr Ser 
145 150 155 160 

Pro Val Val Gin Thr Ala Val He Ala Met Pro Leu Glu He Asp Ala 
165 170 175 

Ser He Asp Pro Leu Gin Ser His Glu Ala Val Leu Val Ser Tyr Gin 
180 185 190 

Glv Glu Thr Trp He Val Thr Thr Lys Gly Arg His Ala He Asp Leu 
195 200 205 

Thr Asp Arg Ala Leu Thr Ser Ser Met Gly He Pro Val Thr Ala Arg 
210 215 220 

Thr Pro He Ser Glu Gly Met Phe Asn Ala Leu Pro Asp Met Gly 



Pro 

225 230 



235 240 



Pro Trp Gin Leu Pro Pro He Pro Ala Ala Gly Ala Pro Asn Ser Leu 
245 250 255 

Gly Leu Pro Asp Asp Leu Val He Gly Ser Val Phe Gin He His Thr 
260 265 270 

Asp Lys Gly Pro Gin Tyr Tyr Val Val Leu Pro Asp Gly He Ala Gin 
275 280 285 

Val Asn Ala Thr Thr Ala Ala Ala Leu Arg Ala Thr Gin Ala His Gly 
290 295 300 

Leu Val Ala Pro Pro Ala Met Val Pro Ser Leu Val Val Arg He Ala 
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305 310 



315 320 



Glu Arg Val Tyx Pro Ser Pro Leu Pro Asp Glu Pro Leu Lys lie Val 
325 330 335 

Ser Arg Pro Gin Asp Pro Ala Leu Cys Trp Ser Trp Gin Arg Ser Ala 
340 345 350 

Glv Asp Gin Ser Pro Gin Ser Thr Val Leu Ser Gly Arg His Leu Pro 
Y 355 360 365 

lie Ser Pro Ser Ala Met Asn Met Gly lie Lys Gin lie His Gly Thr 
370 375 380 

Ala Thr Val Tyr Leu Asp Gly Gly Lys Phe Val Ala Leu Gin Ser Pro 
385 390 395 400 

Asp Pro Arg Tyr Thr Glu Ser Met Tyr Tyr lie Asp Pro Gin Gly Val 
405 410 415 

Arq Tyr Gly Val Pro Asn Ala Glu Thr Ala Lys Ser Leu Gly Leu Ser 
420 425 430 

Ser Pro Gin Asn Ala Pro Trp Glu lie Val Arg Leu Leu Val Asp Gly 
435 440 445 

Pro Val Leu Ser Lys Asp Ala Ala Leu Leu Glu His Asp Thr Leu Pro 
450 455 460 

Ala Asp Pro Ser Pro Arg Lys Val Pro Ala Gly Ala Ser Gly Ala Pro 



465 



470 475 



<210> 60 
<211> 747 
<212> PRT 

<213> mycobacterium tuberculosis 
<220> 

<223> Protein sequence Rv3870 
<400> 60 

Met Thr Thr Lys Lys Phe Thr Pro Thr lie Thr Arg Gly Pro Arg Leu 
15 10 15 

Thr Pro Gly Glu lie Ser Leu Thr Pro Pro Asp Asp Leu Gly He Asp 
20 25 30 

He Pro Pro Ser Gly Val Gin Lys He Leu Pro Tyr Val Met Gly Gly 
35 40 45 

Ala Met Leu Gly Met He Ala He Met Val Ala Gly Gly Thr Arg Gin 
50 55 60 

Leu Ser Pro Tyr Met Leu Met Met Pro Leu Met Met He Val Met Met 
65 70 75 80 

Val Gly Gly Leu Ala Gly Ser Thr Gly Gly Gly Gly Lys Lys Val Pro 
85 90 95 
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Glu He Asn Ala Asp Arg Lys Glu Tyr Leu Arg Tyr Leu Ala Gly Leu 
100 105 HO 

Arg Thr Arg Val Thr Ser Ser Ala Thr Ser Gin Val Ala Phe Phe Ser 
115 120 125 

Tyr His Ala Pro His Pro Glu Asp Leu Leu Ser He Val Gly Thr Gin 
130 135 140 

Arg Gin Trp Ser Arg Pro Ala Asn Ala Asp Phe Tyr Ala Ala Thr Arg 
145 150 155 160 

He Gly He Gly Asp Gin Pro Ala Val Asp Arg Leu Leu Lys Pro Ala 
165 170 175 

Val Gly Gly Glu Leu Ala Ala Ala Ser Ala Ala Pro Gin Pro Phe Leu 
180 185 190 

Glu Pro Val Ser His Met Trp Val Val Lys Phe Leu Arg Thr His Gly 
195 200 205 

Leu He His Asp Cys Pro Lys Leu Leu Gin Leu Arg Thr Phe Pro Thr 
210 215 • 220 

He Ala He Gly Gly Asp Leu Ala Gly Ala Ala Gly Leu Met Thr Ala 
225 230 235 240 

Met He Cys His Leu Ala Val Phe His Pro Pro Asp Leu Leu Gin He 
245 250 255 

Arg Val Leu Thr Glu Glu Pro Asp Asp Pro Asp Trp Ser Trp Leu Lys 
260 265 270 

Trp Leu Pro His Val Gin His Gin Thr Glu Thr Asp Ala Ala Gly Ser 
275 280 285 

Thr Arg Leu He Phe Thr Arg Gin Glu Gly Leu Ser Asp Leu Ala Ala 
290 295 300 

Arg Gly Pro His Ala Pro Asp Ser Leu Pro Gly Gly Pro Tyr Val Val 
305 310 315 320 

Val Val Asp Leu Thr Gly Gly Lys Ala Gly Phe Pro Pro Asp Gly Arg 
325 330 335 

Ala Gly Val Thr Val He Thr Leu Gly Asn His Arg Gly Ser Ala Tyr 
340 345 350 

Arg He Arg Val His Glu Asp Gly Thr Ala Asp Asp Arg Leu Pro Asn 
355 360 365 

Gin Ser Phe Arg Gin Val Thr Ser Val Thr Asp Arg Met Ser Pro Gin 
370 375 380 

Gin Ala Ser Arg He Ala Arg Lys Leu Ala Gly Trp Ser He Thr Gly 
385 



390 395 400 



Thr He Leu Asp Lys Thr Ser Arg Val Gin Lys Lys Val Ala Thr Asp 
405 410 415 

Trp His Gin Leu Val Gly Ala Gin Ser Val Glu Glu He Thr Pro Ser 
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PCT/TO03/01789 



420 



425 430 



Arg Trp Arg Met Tyr Thr Asp Thr Asp Arg Asp Arg Leu Lys lie Pro 
435 440 445 

Phe Gly His Glu Leu Lys Thr Gly Asn Val Met Tyr Leu Asp He Lys 
450 455 460 

Glu Gly Ala Glu Phe Gly Ala Gly Pro His Gly Met Leu He Gly Thr 
465 470 475 480 

Thr Gly Ser Gly Lys Ser Glu Phe Leu Arg Thr Leu He Leu Ser Leu 
485 490 495 

Val Ala Met Thr His Pro Asp Gin Val Asn Leu Leu Leu Thr Asp Phe 
500 505 510 

Lys Gly Gly Ser Thr Phe Leu Gly Met Glu Lys Leu Pro His Thr Ala 
515 520 5 25 

Ala Val Val Thr Asn Met Ala Glu Glu Ala Glu Leu Val Ser Arg Met 
530 535 540 

Glv Glu Val Leu Thr Gly Glu Leu Asp Arg Arg Gin Ser He Leu Arg 
545 550 555 560 

Gin Ala Gly Met Lys Val Gly Ala Ala Gly Ala Leu Ser Gly Val Ala 
565 570 575 

Glu Tyr Glu Lys Tyr Arg Glu Arg Gly Ala Asp Leu Pro Pro Leu Pro 
580 585 590 

Thr Leu Phe Val Val Val Asp Glu Phe Ala Glu Leu Leu Gin Ser His 
595 600 605 

Pro Asp Phe He Gly Leu Phe Asp Arg He Cys Arg Val Gly Arg Ser 
610 615 620 

Leu Arg Val His Leu Leu Leu Ala Thr Gin Ser Leu Gin Thr Gly Gly 
625 630 635 640 

val Arg He Asp Lys Leu Glu Pro Asn Leu Thr Tyr Arg He Ala Leu 
- 645 650 655 

Arg Thr Thr Ser Ser His Glu Ser Lys Ala Val He Gly Thr Pro Glu 
660 665 670 

Ala Gin Tyr He Thr Asn Lys Glu Ser Gly Val Gly Phe Leu Arg Val 
675 680 685 

Gly Met Glu Asp Pro Val Lys Phe Ser Thr Phe Tyr He Ser Gly Pro 
690 695 700 

Tyr Met Pro Pro Ala Ala Gly Val Glu Thr Asn Gly Glu Ala Gly Gly 
705 710 715 

Pro Gly Gin Gin Thr Thr Arg Gin Ala Ala Arg He His Arg Phe Thr 
725 730 735 

Ala Ala Pro Val Leu Glu Glu Ala Pro Thr Pro 
740 745 
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<210> 61 
<211> 591 
<212> PRT 

<213> mycobacterium tuberculosis 
<220> 

<223> Protein sequence Rv3871 
<400> 61 

Met Thr Ala Glu Pro Glu Val Arg Thr Leu Arg Glu Val Val Leu Asp 
1 5 10 15 

Gin Leu Gly Thr Ala Glu Ser Arg Ala Tyr Lys Met Trp Leu Pro Pro 
20 25 30 

Leu Thr Asn Pro Val Pro Leu Asn Glu Leu lie Ala Arg Asp Arg Arg 
35 40 45 

Gin Pro Leu Arg Phe Ala Leu Gly He Met Asp Glu Pro Arg Arg His 
50 55 6° 

Leu Gin Asp Val Trp Gly Val Asp Val Ser Gly Ala Gly Gly Asn lie 
65 70 75 60 

Gly He Gly Gly Ala Pro Gin Thr Gly Lys Ser Thr Leu Leu Gin Thr 
85 90 95 



Met 



Val Met Ser Ala Ala Ala Thr His Ser Pro Arg Asn Val Gin Phe 



100 



105 HO 



Tyr Cys He Asp Leu Gly Gly Gly Gly Leu He Tyr Leu Glu Asn Leu 
115 120 125 

Pro His Val Gly Gly val Ala Asn Arg Ser Glu Pro Asp Lys Val Asn 
130 135 14° 

Arg Val Val Ala Glu Met Gin Ala Val Met Arg Gin Arg Glu Thr Thr 
145 150 155 

Phe Lys Glu His Arg Val Gly Ser He Gly Met Tyr Arg Gin Leu Arg 
165 l* 70 17 

Asp Asp Pro Ser Gin Pro Val Ala Ser Asp Pro Tyr Gly Asp Val Phe 
Leu He He Asp Gly Trp Pro Gly Phe Val Gly Glu Phe Pro Asp Leu 



200 205 



195 

Glu Gly Gin Val Gin Asp Leu Ala Ala Gin Gly Leu Ala Phe Gly Val 



210 



215 220 



His Val He He Ser Thr Pro Arg Trp Thr Glu Leu Lys Ser Arg Val 
225 230 235 

Arg Asp Tyr Leu Gly Thr Lys He Glu Phe Arg Leu Gly Asp Val Asn 
245 250 

Glu Thr Gin He Asp Arg He Thr Arg Glu He Pro Ala Asn Arg Pro 
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260 



265 270 



Glv Arg Ala Val Ser Met Glu Lys His His Leu Met lie Gly Val Pro 
275 280 285 

Arg Phe Asp Gly Val His Ser Ala Asp Asn Leu Val Glu Ala He Thr 
290 295 300 

Ala Gly Val Thr Gin He Ala Ser Gin His Thr Glu Gin Ala Pro Pro 
305 ' 31° 315 

val Arg Val Leu Pro Glu Arg He His Leu His Glu Leu Asp Pro Asn 
325 330 335 

Pro Pro Gly Pro Glu Ser Asp Tyr Art, Thr Arg Trp Glu lie Pro He 
340 345 350 

Gly Leu Arg Glu Thr Asp Leu Thr Pro Ala His Cys His Met His Thr 



355 360 365 

His Leu Leu He Phe Gly Ala Ala Lys Ser Gly Lys Thr Thr 
375 380 

lie Ala His Ala He Ala Arg Ala He Cys Ala Arg Asn Ser Pro Gin 



Asn Pro 
370 



385 



Gin 



390 395 



val Arg Phe Met Leu Ala Asp Tyr Arg Ser Gly Leu Leu Asp Ala 



405 



410 



Val Pro Asp Thr His Leu Leu Gly Ala Gly Ala He Asn Arg Asn Ser 
420 425 430 

Ala Ser Leu Asp Glu Ala Val Gin Ala Leu Ala Val Asn Leu Lys Lys 

435 440 445 

Arg Leu Pro Pro Thr Asp Leu Thr Thr Ala Gin Leu Arg Ser Arg Ser 



455 46° 



450 

Trp Trp Ser Gly Phe Asp Val Val Leu Leu Val Asp Asp Trp His Met 
465 



470 475 



lie Val Gly Ala Ala Gly Gly Met Pro Pro Met Ala Pro Leu Ala Pro 
485 490 

Leu Leu Pro Ala Ala Ala Asp He Gly Leu His He He Val Thr Cys 

Gin Met Ser Gin Ala Tyr Lys Ala Thr Met Asp Lys Phe Val Gly Ala 
515 520 525 

Ala Phe Gly Ser Gly Ala Pro Thr Met Phe Leu Ser Gly Glu Lys Gin 
530 535 540 

Glu Phe Pro Ser Ser Glu Phe Lys Val Lys Arg Arg Pro Pro Gly Gin 



560 

545 



550 555 



Ala Phe Leu Val Ser Pro Asp Gly Lys Glu Val He Gin Ala Pro Tyr 

lie Glu Pro Pro Glu Glu Val Phe Ala Ala Pro Pro Ser Ala Gly 
580 585 
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<210> 62 
<211> 99 
<212> PRT 

<213> Mycobacterium tuberculosis 
<220> 

<223> RV3872-PE35 - PE family -related protein 

<400> 62 _ _ 

Met Glu Lys Met Ser His Asp Pro lie Ala Ala Asp He Gly Thr Gin 
15 10 15 

Val Ser Asp Asn Ala Leu His Gly Val Thr Ala Gly Ser Thr Ala Leu 
20 25 30 

Thr Ser Val Thr Gly Leu Val Pro Ala Gly Ala Asp Glu Val Ser Ala 
35 40 45 

Gin Ala Ala Thr Ala Phe Thr Ser Glu Gly He Gin Leu Leu Ala Ser 
50 55 60 



Asn 



Ala Ser Ala Gin Asp Gin Leu His Arg Ala Gly Glu Ala Val Gin 



65 70 75 

Val Ala Arcj 1 

85 9° 



80 



Asp Val Ala Arg Thr Tyr Ser Gin He Asp Asp Gly Ala Ala Gly Val 

95 



Phe Ala Glu 



<210> 63 
<211> 368 
<212> PRT 

<213> Mycobacterium tuberculosis 
<220> 

<223> Rv3873-PPE68 - PPE family protein 

Me^Le"^ His Ala Met Pro Pro Glu Leu Asn Thr Ala Arg Leu Met 



1 



5 10 15 



Ala Gly Ala Gly Pro Ala Pro Met Leu Ala Ala Ala Ala Gly Trp Gin 
20 25 30 

Thr Leu Ser Ala Ala Leu Asp Ala Gin Ala Val Glu Leu Thr Ala Arg 
35 40 45 

Leu Asn Ser Leu Gly Glu Ala Trp Thr Gly Gly Gly Ser Asp Lys Ala 
50 55 60 

Leu Ala Ala Ala Thr Pro Met Val Val Trp Leu Gin Thr Ala Ser Thr 
65 70 75 



80 



Gin Ala Lys Thr Arg Ala Met Gin Ala Thr Ala Gin Ala Ala Ala Tyr 
85 90 95 

Thr Gin Ala Met Ala Thr Thr Pro Ser Leu Pro Glu He Ala Ala Asn 
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100 



105 HO 



His He Thr Gin Ala Val Leu Thr Ala Thr Asn Phe Phe Gly He Asn 
115 120 125 

Thr He Pro He Ala Leu Thr Glu Met Asp Tyr Phe He Arg Met Trp 
130 ' 135 140 

Asn Gin Ala Ala Leu Ala Met Glu Val Tyr Gin Ala Glu Thr Ala Val 
145 150 155 

Asn Thr Leu Phe Glu Lys Leu Glu Pro Met Ala Ser He Leu Asp Pro 
165 I* 70 175 



Gly Ala Ser Gin Ser Thr Thr Asn Pro He Phe Gly Met Pro Ser Pro 
180 185 190 

Gly ser Ser Thr Pro Val Gly Gin Leu Pro Pro Ala Ala Thr Gin Thr 

200 205 



195 



Leu Gly Gin Leu Gly Glu Met Ser Gly Pro Met Gin Gin Leu Thr Gin 
210 2 15 220 

Pro Leu Gin Gin Val Thr Ser Leu Phe Ser Gin Val Gly Gly Thr Gly 
225 2 30 235 



Gly Gly Asn Pro Ala Asp Glu Glu Ala Ala Gin Met Gly Leu Leu Gly 
Thr Ser Pro Leu Ser Asn His Pro Leu Ala Gly Gly Ser Gly Pro Ser 



260 



265 270 



Ala Gly Ala Gly Leu Leu Arg Ala Glu Ser Leu Pro Gly Ala Gly Gly 
275 280 285 

Thr Arg Thr Pro Leu Met Ser Gin Leu He Glu Lys Pro Val 
295 300 



Ser Leu 
290 

Ala Pro Ser Val Met Pro Ala Ala Ala Ala Gly Ser Ser Ala Thr Gly 
305 3 10 315 



Gly Ala Ala Pro Vai Gly Ala Gly Ala Met Gly Gin Gly Ala Gin Ser 
325 330 33 

Gly Gly Ser Thr Arg Pro Gly Leu Val Ala Pro Ala Pro Leu Ala Gin 



340 



345 



Glu Arg Glu Glu Asp Asp Glu Asp Asp Trp Asp Glu Glu Asp Asp Trp 
355 3 «0 365 



<210> 64 
<211> 100 
<212> PRT 

<213> Mycobacterium tuberculosxs 
<220> 



<223> Rv3874-esxB - lOkDa culture filtrate antigen CFP10 
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<400> 64 , 
Met Ala Glu Met Lys Thr Asp Ala Ala Thr Leu Ala Gin Glu Ala Gly 
15 10 15 

Asn Phe Glu Arg He Ser Gly Asp Leu Lys Thr Gin He Asp Gin Val 
20 25 30 

Glu Ser Thr Ala Gly Ser Leu Gin Gly Gin Trp Arg Gly Ala Ala Gly 
35 40 45 

Thr Ala Ala Gin Ala Ala Val Val Arg Phe Gin Glu Ala Ala Asn Lys 
50 55 60 

Gin Lys Gin Glu Leu Asp Glu He Ser Thr Asn He Arg Gin Ala Gly 
65 70 75 80 

Val Gin Tyr Ser Arg Ala Asp Glu Glu Gin Gin Gin Ala Leu Ser Ser 
85 90 95 

Gin Met Gly Phe 
100 



<210> 65 
<211> 95 
<212> PRT 

<213> Mycobacterium tuberculosis 
<220> 

<223> Rv3875-Esat6 - 6 kDa early secretory antigenic 
target Esat6 (Esat-6) 

<400> 65 

Met Thr Glu Gin Gin Trp Asn Phe Ala Gly He Glu Ala Ala Ala Ser 
1 5 10 15 

Ala He Gin Gly Asn Val Thr Ser He His Ser Leu Leu Asp Glu Gly 
20 25 30 

Lys Gin Ser Leu Thr Lys Leu Ala Ala Ala Trp Gly Gly Ser Gly Ser 
35 40 45 

Glu Ala Tyr Gin Gly Val Gin Gin Lys Trp Asp Ala Thr Ala Thr Glu 
50 55 60 

Leu Asn Asn Ala Leu Gin Asn Leu Ala Arg Thr He Ser Glu Ala Gly 
65 70 75 80 

Gin Ala Met Ala Ser Thr Glu Gly Asn Val Thr Gly Met Phe Ala 
85 90 95 



<210> 66 
<211> 666 
<212> PRT 

<213> mycobacterium tuberculosis 
<220> 

<223> Protein sequence Rv3876 



<400> 66 
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Met Ala Ala Asp Tyr Asp Lys Leu Phe Arg Pro His Glu Gly Met Glu 
1 5 10 15 

Ala Pro Asp Asp Met Ala Ala Gin Pro Phe Phe Asp Pro Ser Ala Ser 
20 25 

Phe Pro Pro Ala Pro Ala Ser Ala Asn Leu Pro Lys Pro Asn Gly Gin 

35 4° 45 

Thr Pro Pro Pro Thr Ser Asp Asp Leu Ser Glu Arg Phe Val Ser Ala 



50 



55 6° 



Pro Pro Pro Pro Pro Pro Pro Pro Pro Pro Pro Pro Pro Thr Pro Met 
65 70 75 

Pro lie Ala Ala Gly Glu Pro Pro Ser Pro Glu Pro Ala Ala Ser Lys 
85 9° 95 

Pro Pro Thr Pro Pro Met Pro He Ala Gly Pro Glu Pro Ala Pro Pro 

100 105 110 

Lys Pro Pro Thr Pro Pro Met Pro He Ala Gly Pro Glu Pro Ala Pro 



115 120 125 



Pro Lys Pro Pro Thr Pro Pro Met Pro He Ala Gly Pro .Ala Pro Thr 
130 135 140 

Pro Thr Glu Ser Gin Leu Ala Pro Pro Arg Pro Pro Thr Pro Gin Thr 
145 150 155 

Pro Thr Gly Ala Pro Gin Gin Pro Glu Ser Pro Ala Pro His Val Pro 



165 



170 



Ser His Gly Pro His Gin Pro Arg Arg Thr Ala Pro Ala Pro Pro Trp 



180 



Ala Lys Met Pro lie Gly Glu Pro Pro Pro Ala Pro Ser Arg Pro Ser 



195 200 205 

Ala Ser Pro Ala Glu Pro Pro Thr Arg Pro Ala Pro Gin His Ser Arg 



one 220 
210 21b 



Arg Ala Arg Arg Gly His Arg Tyr Arg Thr Asp Thr Glu Arg Asn Val 
225 230 235 

Gly Lys val Ala Thr Gly Pro Ser He Gin Ala Arg Leu Arg Ala Glu 
245 250 



Glu Ala Ser Gly Ala Gin Leu Ala Pro 



Gly Thr Glu Pro Ser Pro Ala 



260 



265 



270 



Pro Leu Gly Gin Pro Arg Ser Tyr Leu Ala Pro Pro Thr Arg Pro Ala 



275 



280 



305 



Pro Thr Glu Pro Pro Pro Ser Pro Ser Pro Gin Arg Asn Ser Gly Arg 

290 295 
Arg Ala Glu Arg Arg Val His Pro Asp Leu Ala Ala Gin His Ala Ala 
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Ala Gin Pro Asp Ser He Thr Ala Ala Thr Thr Gly Gly Arg Arg Arg 
325 330 335 

Lys Arg Ala Ala Pro Asp Leu Asp Ala Thr Gin Lys Ser Leu Arg Pro 
340 345 350 

Ala Ala Lys Gly Pro Lys Val Lys Lys Val Lys Pro Gin Lys Pro Lys 
355 360 365 

Ala Thr Lys Pro Pro Lys Val Val Ser Gin Arg Gly Trp Arg His Trp 
370 375 380 

Val His Ala Leu Thr Arg He Asn Leu Gly Leu Ser Pro Asp Glu Lys 
385 390 395 400 

Tyr Glu Leu Asp Leu His Ala Arg Val Arg Arg Asn Pro Arg Gly Ser 
405 410 

Tyr Gin He Ala Val Val Gly Leu Lys Gly Gly Ala Gly Lys Thr Thr 
^ 420 425 430 



Leu Thr Ala Ala Leu Gly Ser Thr Leu Ala Gin Val Arg Ala Asp Arg 
435 440 445 

lie Leu Ala Leu Asp Ala Asp Pro Gly Ala Gly Asn Leu Ala Asp Arg 
450 455 460 

Val Gly Arg Gin Ser Gly Ala Thr He Ala Asp Val Leu Ala Glu Lys 
465 " 470 475 

Glu Leu Ser His Tyr Asn Asp He Arg Ala His Thr Ser Val Asn Ala 



485 



490 495 



Val Asn Leu Glu Val Leu Pro Ala Pro Glu Tyr Ser Ser Ala Gin Arg 
500 



505 510 



His Phe He Ala Asp Pro Ala Ser Arg 
520 525 

Phe Tyr Asn Leu Val Leu Ala Asp Cys Gly Ala Gly Phe Phe Asp Pro 



Ala Leu Ser Asp Ala Asp Trp 
515 



535 540 



530 

Leu Thr Arg Gly Val Leu Ser Thr Val Ser Gly Val Val Val Val Ala 
545 



550 555 



Ser Val Ser He Asp Gly Ala Gin Gin Ala Ser Val Ala Leu Asp Trp 
565 570 57b 

Leu Arg Asn Asn Gly Tyr Gin Asp Leu Ala Ser Arg Ala Cys Val Val 
580 585 

He Asn His He Met Pro Gly Glu Pro Asn Val Ala Val Lys Asp Leu 
595 600 605 

Val Arg His Phe Glu Gin Gin Val Gin Pro Gly Arg Val Val Val Met 
610 615 620 

Pro Trp Asp Arg His He Ala Ala Gly Thr Glu He Ser Leu Asp Leu 
625 630 635 



Leu Asp 



Pro He Tyr Lys Arg Lys Val Leu Glu Leu Ala Ala Ala Leu 



WO 03/085098 



645 
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650 
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655 



Ser Asp Asp Phe Glu Arg Ala Gly Arg Arg 
660 665 



<210> 67 
<211> 511 
<212> PRT 

<213> mycobacterium tuberculosis 
<220> 

<223> Protein sequence Rv3877 
<400> 67 

Met Ser Ala Pro Ala Val Ala Ala Gly Pro Thr Ala Ala Gly Ala Thr 
15 10 15 

Ala Ala Arg Pro Ala Thr Thr Arg Val Thr He Leu Thr Gly Arg Arg 
20 25 t 30 

Met Thr Asp Leu Val Leu Pro Ala Ala Val Pro Met Glu Thr Tyr He 
35 40 45 

Asp Asp Thr Val Ala Val Leu Ser Glu Val Leu Glu Asp Thr Pro Ala 
50 55 60 

Asp Val Leu Gly Gly Phe Asp Phe Thr Ala Gin Gly Val Trp Ala Phe 
65 70 75 80 

Ala Arg Pro Gly Ser Pro Pro Leu Lys Leu Asp Gin Ser Leu Asp Asp 
85 90 95 

Ala Gly Val Val Asp Gly Ser Leu Leu Thr Leu Val Ser Val Ser Arg 
100 ~ 105 HO 

Thr Glu Arg Tyr Arg Pro Leu Val Glu Asp Val He Asp Ala He Ala 
115 120 I 25 

Val Leu Asp Glu Ser Pro Glu Phe Asp Arg Thr Ala Leu Asn Arg Phe 
130 135 1*0 

Val Gly Ala Ala He Pro Leu Leu Thr Ala Pro Val He Gly Met Ala 

155 • Lbu 



Gly 

145 150 



Met Arg Ala Trp Trp Glu Thr Gly Arg Ser Leu Trp Trp Pro Leu Ala 
165 170 i75 

He Gly He Leu Gly He Ala Val Leu Val Gly Ser Phe Val Ala Asn 
180 185 19° 

Arg Phe Tyr Gin Ser Gly His Leu Ala Glu Cys Leu Leu Val Thr Thr 
195 200 205 

Tyr Leu "Leu He Ala Thr Ala Ala Ala Leu Ala Val Pro Leu Pro Arg 
210 215 220 



Gly Val Asn Ser Leu Gly Ala Pro Gin Val Ala Gly Ala Ala Thr Ala 
225 230 235 240 
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Val Leu Phe Leu Thr Leu Met Thr Arg Gly Gly Pro Arg Lys Arg His 
245 250 255 

Glu Leu Ala Ser Phe Ala Val lie Thr Ala lie Ala Val He Ala Ala 
260 265 270 

Ala Ala Ala Phe Gly Tyr Gly Tyr Gin Asp Trp Val Pro Ala Gly Gly 
275 280 285 

lie Ala Phe Gly Leu Phe He Val Thr Asn Ala Ala Lys Leu Thr Val 
290 " 295 300 

Ala Val Ala Arg He Ala Leu Pro Pro He Pro Val Pro Gly Glu Thr 
305 310 315 320 

Val Asp Asn Glu Glu Leu Leu Asp Pro Val Ala Thr Pro Glu Ala Thr 
325 330 335 

Ser Glu Glu Thr Pro Thr Trp Gin Ala He He Ala Ser Val Pro Ala 
340 345 350 

Ser Ala Val Arg Leu Thr Glu Arg Ser Lys Leu Ala Lys Gin Leu Leu 
355 360 365 

lie Gly Tyr Val Thr Ser Gly Thr Leu He Leu Ala Ala Gly Ala He 
370 375 380 

Ala Val Val Val Arg Gly His Phe Phe Val His Ser Leu Val Val Ala 
385 390 395 400 

Gly Leu He Thr Thr Val Cys Gly Phe Arg Ser Arg Leu Tyr Ala Glu 
405 410 415 

Arg Trp Cys Ala Trp Ala Leu Leu Ala Ala Thr Val Ala lie Pro Thr 
420 425 430 

Gly Leu Thr Ala Lys Leu He He Trp Tyr Pro His Tyr Ala Trp Leu 
435 440 445 

Leu Leu Ser Val Tyr Leu Thr Val Ala Leu Val Ala Leu Val Val Val 



450 455 



460 



Gly Ser Met Ala His Val Arg Arg Val Ser Pro Val Val Lys Arg Thr 



465 470 



Glu Leu He Asp Gly Ala Met He Ala Ala He He Pro Met Leu 



Leu 

485 



490 «5 



Leu Trp lie Thr Gly Val Tyr Asp Thr Val Arg Asn He Arg Phe 
500 505 510 



<210> 68 
<211> 280 
<212> PRT 

<213> Mycobacterium tuberculosis 

<223> RV3878 - conserved hypothetical alanine rich 
protein 
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<400> 68 

Met Ala Glu Pro Leu Ala Val Asp Pro Thr Gly Leu Ser Ala Ala Ala 
15 10 15 

Ala Lys Leu Ala Gly Leu Val Phe Pro Gin Pro Pro Ala Pro lie Ala 
20 25 30 

Val Ser Gly Thr Asp Ser Val Val Ala Ala lie Asn Glu Thr Met Pro 
35 40 45 

Ser lie Glu Ser Leu Val Ser Asp Gly Leu Pro Gly Val Lys Ala Ala 
50 55 60 

Leu Thr Arg Thr Ala Ser Asn Met Asn Ala Ala Ala Asp Val Tyr Ala 
65 70 75 80 

Lys Thr Asp Gin Ser Leu Gly Thr Ser Leu Ser Gin Tyr Ala Phe Gly 
85 90 95 

Ser Ser Gly Glu Gly Leu Ala Gly Val Ala Ser Val Gly Gly Gin Pro 
100 105 110 

Ser Gin Ala Thr Gin Leu Leu Ser Thr Pro Val Ser Gin Val Thr Thr 
115 120 125 

Gin Leu Gly Glu Thr Ala Ala Glu Leu Ala Pro Arg Val Val Ala Thr 
130 135 140 

Val Pro Gin Leu Val Gin Leu Ala Pro His Ala Val Gin Met Ser Gin 
145 150 155 160 

Asn Ala Ser Pro lie Ala Gin Thr lie Ser Gin Thr Ala Gin Gin Ala 
165 170 175 

Ala Gin Ser Ala Gin Gly Gly Ser Gly Pro Met Pro Ala Gin Leu Ala 
180 185 190 

Ser Ala Glu Lys Pro Ala Thr Glu Gin Ala Glu Pro Val His Glu Val 
195 200 205 

Thr Asn Asp Asp Gin Gly Asp Gin Gly Asp Val Gin Pro Ala Glu Val 
210 215 220 

Val Ala Ala Ala Arg Asp Glu Gly Ala Gly Ala Ser Pro Gly Gin Gin 
225 230 235 240 

Pro Gly Gly Gly Val Pro Ala Gin Ala Met Asp Thr Gly Ala Gly Ala 
245 250 255 

Arg Pro Ala Ala Ser Pro Leu Ala Ala Pro Val Asp Pro Ser Thr Pro 
260 265 270 

Ala Pro Ser Thr Thr Thr Thr Leu 
275 280 



<210> 69 
<211> 729 
<212> PRT 

<213> Mycobacterium tuberculosis 
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<220> 

<223> Rv3879c - hypothetical alanine and proline rich 
protein 

<400> 69 

Met Ser lie Thr Arg Pro Thr Gly Ser Tyr Ala Arg Gin Met Leu Asp 
15 10 15 

Pro Gly Gly Trp Val Glu Ala Asp Glu Asp Thr Phe Tyr Asp Arg Ala 
20 25 30 

Gin Glu Tyr Ser Gin Val Leu Gin Arg Val Thr Asp Val Leu Asp Thr 
35 40 45 

Cys Arg Gin Gin Lys Gly His Val Phe Glu Gly Gly Leu Trp Ser Gly 
50 55 60 

Gly Ala Ala Asn Ala Ala Asn Gly Ala Leu Gly Ala Asn lie Asn Gin 
65 70 75 80 

Leu Met Thr Leu Gin Asp Tyr Leu Ala Thr Val lie Thr Trp His Arg 
85 90 95 

His He Ala Gly Leu He Glu Gin Ala Lys Ser Asp He Gly Asn Asn 
100 105 HO 

Val Asp Gly Ala Gin Arg Glu He Asp He Leu Glu Asn Asp Pro Ser 
115 120 125 

Leu Asp Ala Asp Glu Arg His Thr Ala He Asn Ser Leu Val Thr Ala 
130 135 140 

Thr His Gly Ala Asn Val Ser Leu Val Ala Glu Thr Ala Glu Arg Val 
145 150 155 160 

Leu Glu Ser Lys Asn Trp Lys Pro Pro Lys Asn Ala Leu Glu Asp Leu 
165 170 175 

Leu Gin Gin Lys Ser Pro Pro Pro Pro Asp Val Pro Thr Leu Val Val 
180 185 190 

Pro Ser Pro Gly Thr Pro Gly Thr Pro Gly Thr Pro He Thr Pro Gly 
195 200 205 

Thr Pro He Thr Pro Gly Thr Pro He Thr Pro He Pro Gly Ala Pro 
210 215 220 

Val Thr Pro He Thr Pro Thr Pro Gly Thr Pro Val Thr Pro Val Thr 
225 230 235 240 

Pro Gly Lys Pro Val Thr Pro Val Thr Pro Val Lys Pro Gly Thr Pro 
245 250 255 

Gly Glu Pro Thr Pro He Thr Pro Val Thr Pro Pro Val Ala Pro Ala 
260 265 270 

Thr Pro Ala Thr Pro Ala Thr Pro Val Thr Pro Ala Pro Ala Pro His 
275 280 285 

Pro Gin Pro Ala Pro Ala Pro Ala Pro Ser Pro Gly Pro Gin Pro Val 
290 295 300 
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Thr Pro Ala Thr Pro Gly Pro Ser Gly Pro Ala Thr Pro Gly Thr Pro 
305 310 315 320 

Gly Gly Glu Pro Ala Pro His Val Lys Pro Ala Ala Leu Ala Glu Gin 
325 330 335 

Pro Gly Val Pro Gly Gin His Ala Gly Gly Gly Thr Gin Ser Gly Pro 
340 345 350 

Ala His Ala Asp Glu Ser Ala Ala Ser Val Thr Pro Ala Ala Ala Ser 
355 360 365 

Gly Val Pro Gly Ala Arg Ala Ala Ala Ala Ala Pro Ser Gly Thr Ala 
370 375 380 

Val Gly Ala Gly Ala Arg Ser Ser Val Gly Thr Ala Ala Ala Ser Gly 
385 * 390 395 400 

Ala Gly Ser His Ala Ala Thr Gly Arg Ala Pro Val Ala Thr Ser Asp 
405 410 415 

Lys Ala Ala Ala Pro Ser Thr Arg Ala Ala Ser Ala Arg Thr Ala Pro 
420 425 430 

Pro Ala Arg Pro Pro Ser Thr Asp His He Asp Lys Pro Asp Arg Ser 
435 440 445 

Glu Ser Ala Asp Asp Gly Thr Pro Val Ser Met He Pro Val Ser Ala 
450 455 460 

Ala Arg Ala Ala Arg Asp Ala Ala Thr Ala Ala Ala Ser Ala Arg Gin 
465 470 475 480 

Arg Gly Arg Gly Asp Ala Leu Arg Leu Ala Arg Arg He Ala Ala Ala 
485 490 495 

Leu Asn Ala Ser Asp Asn Asn Ala Gly Asp Tyr Gly Phe Phe Trp He 
500 505 510 

Thr Ala Val Thr Thr Asp Gly Ser He Val Val Ala Asn Ser Tyr Gly 
515 520 525 

Leu Ala Tyr He Pro Asp Gly Met Glu Leu Pro Asn Lys Val Tyr Leu 
530 535 540 

Ala Ser Ala Asp His Ala He Pro Val Asp Glu He Ala Arg Cys Ala 
545 550 555 560 

Thr Tyr Pro Val Leu Ala Val Gin Ala Trp Ala Ala Phe His Asp Met 
565 570 575 

Thr Leu Arg Ala Val He Gly Thr Ala Glu Gin Leu Ala Ser Ser Asp 
580 585 590 

Pro Gly Val Ala Lys He Val Leu Glu Pro Asp Asp He Pro Glu Ser 
595 600 605 

Gly Lys Met Thr Gly Arg Ser Arg Leu Glu Val Val Asp Pro Ser Ala 
'610 615 620 



WO 03/085098 ^ft^ PCT/IB03/01789 

57/66 

Ala Ala Gin Leu Ala Asp Thr Thr Asp Gin Arg Leu Leu Asp Leu Leu 
625 630 ' 635 640 

Pro Pro Ala Pro Val Asp Val Asn Pro Pro Gly Asp Glu Arg His Met 
645 650 655 

Leu Trp Phe Glu Leu Met Lys Pro Met Thr Ser Thr Ala Thr Gly Arg 
660 665 670 

Glu Ala Ala His Leu Arg Ala Phe Arg Ala Tyr Ala Ala His Ser Gin 
675 680 685 

Glu lie Ala Leu His Gin Ala His Thr Ala Thr Asp Ala Ala Val Gin 
690 695 700 

Arg Val Ala Val Ala Asp Trp Leu Tyr Trp Gin Tyr Val Thr Gly Leu 
705 710 715 720 

Leu Asp Arg Ala Leu Ala Ala Ala Cys 
725 



<210> 70 
<211> 115 
<212> PRT 

<213> Mycobacterium tuberculosis 
<220> 

<223> Rv3880c - conserved hypothetical protein 

<400> 70 rrtX» T 

Val Ser Met Asp Glu Leu Asp Pro His Val Ala Arg Ala Leu Thr Leu 
1 5 10 15 

Ala Ala Arg Phe Gin Ser Ala Leu Asp Gly Thr Leu Asn Gin Met Asn 
20 25 30 

Asn Gly Ser Phe Arg Ala Thr Asp Glu Ala Glu Thr Val Glu Val Thr 
35 40 45 

lie Asn Gly His Gin Trp Leu Thr Gly Leu Arg He Glu Asp Gly Leu 
50 55 60 

Leu Lys Lys Leu Gly Ala Glu Ala Val Ala Gin Arg Val Asn Glu Ala 
65 "* 70 75 80 

Leu His Asn Ala Gin Ala Ala Ala Ser Ala Tyr Asn Asp Ala Ala Gly 
85 90 $5 

Glu Gin Leu Thr Ala Ala Leu Ser Ala Met Ser Arg Ala Met Asn Glu 
100 105 HO 

Gly Met .Ala 
115 



<210> 71 
<211> 460 
<212> PRT 

<213> Mycobacterium tuberculosis 
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<220> , , 

<223> Rv3881c - conserved hypothetical alanine and 

glycine rich protein 

Met°Thr 1 Gln Ser Gin Thr Val Thr Val Asp Gin Gin Glu lie Leu Asn 
1 5 10 15 

Arg Ala Asn Glu Val Glu Ala Pro Met Ala Asp Pro Pro Thr Asp Val 
20 25 30 

Pro lie Thr Pro Cys Glu Leu Thr Ala Ala Lys Asn Ala Ala Gin Gin 
35 40 45 

Leu Val Leu Ser Ala Asp Asn Met Arg Glu Tyr Leu Ala Ala Gly Ala 
50 55 60 

Lys Glu Arg Gin Arg Leu Ala Thr Ser Leu Arg Asn Ala Ala Lys Ala 
65 70 75 80 

Tyr Gly Glu Val Asp Glu Glu Ala Ala Thr Ala Leu Asp Asn Asp Gly 
85 90 

Glu Gly Thr Val Gin Ala Glu Ser Ala Gly Ala Val Gly Gly Asp Ser 
100 105 H° 



Ser Ala Glu Leu Thr Asp Thr Pro Arg Val Ala Thr Ala Gly Glu Pro 
115 120 125 

Asn Phe Met Asp Leu Lys Glu Ala Ala Arg Lys Leu Glu Thr Gly Asp 
130 135 140 

Gin Gly Ala Ser Leu Ala His Phe Ala Asp Gly Trp Asn Thr Phe Asn 
145 150 155 

Leu Thr Leu Gin Gly Asp Val Lys Arg Phe Arg Gly Phe Asp Asn Trp 
165 170 175 

Glu Gly Asp Ala Ala Thr Ala Cys Glu Ala Ser Leu Asp Gin Gin Arg 
180 I 85 

Gin Trp lie Leu His Met Ala Lys Leu Ser Ala Ala Met Ala Lys Gin 



195 



200 



Ala Gin Tyr Val Ala Gin Leu His Val Trp Ala Arg Arg Glu His Pro 

220 



210 



215 



Thr Tyr Glu Asp He Val Gly Leu Glu Arg Leu Tyr Ala Glu Asn Pro 

~ 235 ™ 



225 



230 



Ser Ala Arg Asp Gin He Leu Pro Val Tyr Ala Glu Tyr Gin Gin Arg 
245 250 

Ser Glu Lys Val Leu Thr Glu Tyr Asn Asn Lys Ala Ala Leu Glu Pro 



260 



265 



Val Asn Pro Pro Lys Pro Pro Pro Ala He Lys He Asp Pro Pro Pro 



275 



280 285 



Pro Pro Gin Glu Gin Gly Leu lie Pro Gly Phe Leu Met Pro Pro Ser 
290 295 300 
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Asp Gly Ser Gly Val Thr Pro Gly Thr Gly Met Pro Ala Ala Pro Met 
305 310 315 320 

Val Pro Pro Thr Gly Ser Pro Gly Gly Gly Leu Pro Ala Asp Thr Ala 
325 330 335 

Ala Gin Leu Thr Ser Ala Gly Arg Glu Ala Ala Ala Leu Ser Gly Asp 
340 345 350 

Val Ala val Lys Ala Ala Ser Leu Gly Gly Gly Gly Gly Gly Gly Val 
355 360 365 

Pro Ser Ala Pro Leu Gly Ser Ala He Gly Gly Ala Glu Ser Val Arg 
370 375 380 

Pro Ala Gly Ala Gly Asp He Ala Gly Leu Gly Gin Gly Arg Ala Gly 
385 390 395 400 

Gly Gly Ala Ala Leu Gly Gly Gly Gly Met Gly Met Pro Met Gly Ala 
405 410 415 

Ala His Gin Gly Gin Gly Gly Ala Lys Ser Lys Gly Ser Gin Gin Glu 
420 425 430 

Asp Glu Ala Leu Tyr Thr Glu Asp Arg Ala Trp Thr Glu Ala Val He 
435 440 445 

Gly Asn Arg Arg Arg Gin Asp Ser Lys Glu Ser Lys 
450 455 460 



<210> 72 
<211> 462 
<212> PRT 

<213> Mycobacterium tuberculosis 
<220> 

<223> Rv3882c - possible conserved membrane protexn 

Met°Arg 2 Asn Pro Leu Gly Leu Arg Phe Ser Thr Gly His Ala Leu Leu 
1 5 10 15 

Ala Ser Ala Leu Ala Pro Pro Cys He He Ala Phe Leu Glu Thr Arg 
20 25 30 

Tvr Trp Trp Ala Gly He Ala Leu Ala Ser Leu Gly Val He Val Ala 
35 40 45 

Thr Val Thr Phe Tyr Gly Arg Arg He Thr Gly Trp Val Ala Ala Val 
50 55 60 

Tyr Ala Trp Leu Arg Arg Arg Arg Arg Pro Pro Asp Ser Ser Ser Glu 
65 70 7S 80 

Pro Val val Gly Ala Thr Val Lys Pro Gly Asp His Val Ala Val Arg 
85 90 95 

Trp Gin Gly Glu Phe Leu Val Ala Val He Glu Leu He Pro Arg Pro 
100 105 I 10 



WO 03/085098 



60/66 



PCT/IB03/01789 



Phe Thr Pro Thr Val He Val Asp Gly Gin Ala His Thr Asp Asp Met 
115 120 125 

Leu Asp Thr Gly Leu Val Glu Glu Leu Leu Ser Val His Cys Pro Asp 
130 135 140 

Leu Glu Ala Asp He Val Ser Ala Gly Tyr Arg Val Gly Asn Thr Ala 
145 150 155 160 

Ala Pro Asp Val Val Ser Leu Tyr Gin Gin Val He Gly Thr Asp Pro 
165 170 175 

Ala Pro Ala Asn Arg Arg Thr Trp He Val Leu Arg Ala Asp Pro Glu 
180 185 190 

Arg Thr Arg Lys Ser Ala Gin Arg Arg Asp Glu Gly Val Ala Gly Leu 
195 200 205 

Ala Arg Tyr Leu Val Ala Ser Ala Thr Arg He Ala Asp Arg Leu Ala 
210 215 220 

Ser His Gly Val Asp Ala Val Cys Gly Arg Ser Phe Asp Asp Tyr Asp 
225 230 235 240 

His Ala Thr Asp He Gly Phe Val Arg Glu Lys Trp Ser Met He Lys 
245 250 255 

Gly Arg Asp Ala Tyr Thr Ala Ala Tyr Ala Ala Pro Gly Gly Pro Asp 
260 265 270 

Val Trp Trp Ser Ala Arg Ala Asp His Thr He Thr Arg Val Arg Val 
275 280 285 

Ala Pro Gly Met Ala Pro Gin Ser Thr Val Leu Leu Thr Thr Ala Asp 
290 295 300 

Lys Pro Lys Thr Pro Arg Gly Phe Ala Arg Leu Phe Gly Gly Gin Arg 
305 310 315 

Pro Ala Leu Gin Gly Gin His Leu Val Ala Asn Arg His Cys Gin Leu 
325 330 335 

Pro He Gly Ser Ala Gly Val Leu Val Gly Glu Thr Val Asn Arg Cys 



340 



345 



Pro Val Tyr Met Pro Phe Asp Asp Val Asp He Ala Leu Asn Leu Gly 
355 360 365 

Asp Ala Gin Thr Phe Thr Gin Phe Val Val Arg Ala Ala Ala Ala Gly 
370 375 380 

Ala Met Val Thr Val Gly Pro Gin Phe Glu Glu Phe Ala Arg Leu He 
385 



390 395 



Gly Ala His He Gly Gin Glu Val Lys Val Ala Trp Pro Asn Ala Thr 
1 405 410 415 

Thr Tyr Leu Gly Pro His Pro Gly He Asp Arg Val He Leu Arg His 
420 425 430 
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Asn Val He Gly Thr Pro Arg His Arg Gin Leu Pro He Arg Arg Val 
435 440 445 

Ser Pro Pro Glu Glu Ser Arg Tyr Gin Met Ala Leu Pro Lys 
450 455 460 



<210> 73 
<211> 446 
<212> PRT 

<213> Mycobacterium tuberculosis 
<220> 

<223> Rv3883c - possible secreted protease 
<400> 73 

Val His Arg He Phe Leu He Thr Val Ala Leu Ala Leu Leu Thr Ala 
15 10 15 

Ser Pro Ala Ser Ala He Thr Pro Pro Pro He Asp Pro Gly Ala Leu 
20 25 30 

Pro Pro Asp Val Thr Gly Pro Asp Gin Pro Thr Glu Gin Arg Val Leu 
35 40 45 

Cys Ala Ser Pro Thr Thr Leu Pro Gly Ser Gly Phe His Asp Pro Pro 
50 55 60 

Trp Ser Asn Thr Tyr Leu Gly Val Ala Asp Ala His Lys Phe Ala Thr 
65 70 75 80 

Gly Ala Gly val Thr Val Ala Val He Asp Thr Gly Val Asp Ala Ser 
85 90 95 

Pro Arg Val Pro Ala Glu Pro Gly Gly Asp Phe Val Asp Gin Ala Gly 
100 105 110 

Asn Gly Leu Ser Asp Cys Asp Ala His Gly Thr Leu Thr Ala Ser He 
115 120 125 

He Ala Gly Arg Pro Ala Pro Thr Asp Gly Phe Val Gly Val Ala Pro 
130 * 135 140 

Asp Ala Arg Leu Leu Ser Leu Arg Gin Thr Ser Glu Ala Phe Glu Pro 
145 150 155 

Val Gly ser Gin Ala Asn Pro Asn Asp Pro Asn Ala Thr Pro Ala Ala 
165 170 I 75 

Gly Ser He Arg Ser Leu Ala Arg Ala Val Val His Ala Ala Asn Leu 
180 185 190 

Gly Val Gly Val He Asn He Ser Glu Ala Ala Cys Tyr Lys Val Ser 
195 200 205 

Arg Pro He Asp Glu Thr Ser Leu Gly Ala Ser He Asp Tyr Ala Val 
210 215 220 

Asn Val Lys Gly Val Val Val Val Val Ala Ala Gly Asn Thr Gly Gly 
225 ' 230 235 240 
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Asp Cys Val Gin Asn Pro Ala Pro Asp Pro Ser Thr Pro Gly Asp Pro 
245 250 255 

Arct Gly Trp Asn Asn Val Gin Thr Val Val Thr Pro Ala Trp Tyr Ala 
260 265 270 

Pro Leu Val Leu Ser Val Gly Gly He Gly Gin Thr Gly Met Pro Ser 
275 280 285 

Ser Phe Ser Met His Gly Pro Trp Val Asp Val Ala Ala Pro Ala Glu 
290 295 300 

He Val Ala Leu Gly Asp Thr Gly Glu Pro Val Asn Ala Leu Gin 



315 320 



Asn 

305 310 

Glv Arg Glu Gly Pro Val Pro He Ala Gly Thr Ser Phe Ala Ala Ala 
325 330 335 

Tvr Val Ser Gly Leu Ala Ala Leu Leu Arg Gin Arg Phe Pro Asp Leu 
340 345 350 

Thr Pro Ala Gin He He His Arg He Thr Ala Thr Ala Arg His Pro 
355 360 365 

Gly Gly Gly Val Asp Asp Leu Val Gly Ala Gly Val He Asp Ala Val 
370 375 380 

Ala Ala Leu Thr Trp Asp He Pro Pro Gly Pro Ala Ser Ala Pro Tyr 
385 390 395 400 

Asn Val Arg Arg Leu Pro Pro Pro Val Val Glu Pro Gly Pro Asp Arg 
405 410 415 

Arg Pro He Thr Ala Val Ala Leu Val Ala Val Gly Leu Thr Leu Ala 
420 425 430 

Leu Gly Leu Gly Ala Leu Ala Arg Arg Ala Leu Ser Arg Arg 
435 " 440 445 



<210> 74 
<211> 619 
<212> PRT 

<213> Mycobacterium tuberculosis 
<220> 

<223> Rv3884c - probable CBXX/CFQX family protean 

Me?°Ser 4 Arg Met Val Asp Thr Met Gly Asp Leu Leu Thr Ala Arg Arg 
1 5 10 15 

His Phe Asp Arg Ala Met Thr He Lys Asn Gly Gin Gly Cys Val Ala 
20 25 30 

Ala Leu Pro Glu Phe Val Ala Ala Thr Glu Ala Asp Pro Ser Met Ala 
35 40 45 

Asp Ala Trp Leu Gly Arg He Ala Cys Gly Asp Arg Asp Leu Ala Ser 
50 55 60 
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Leu Lys Gin Leu Asn Ala His Ser Glu Trp Leu His Arg Glu Thr Thr 
65 70 75 80 

Arg lie Gly Arg Thr Leu Ala Ala Glu Val Gin Leu Gly Pro Ser lie 
85 90 95 

Gly He Thr Val Thr Asp Ala Ser Gin Val Gly Leu Ala Leu Ser Ser 
100 105 HO 

Ala Leu Thr He Ala Gly Glu Tyr Ala Lys Ala Asp Ala Leu Leu Ala 
115 120 125 

Asn Arg Glu Leu Leu Asp Ser Trp Arg Asn Tyr Gin Trp His Gin Leu 
130 135 140 

Ala Arg Ala Phe Leu Met Tyr Val Thr Gin Arg Trp Pro Asp Val Leu 
145 150 155 160 

Ser Thr Ala Ala Glu Asp Leu Pro Pro Gin Ala He Val Met Pro Ala 
165 170 175 

Val Thr Ala Ser He Cys Ala Leu Ala Ala His Ala Ala Ala His Leu 
180 " 185 190 

Gly Gin Gly Arg Val Ala Leu Asp Trp Leu Asp Arg Val Asp Val He 
195 200 205 

Gly His Ser Arg Ser Ser Glu Arg Phe Gly Ala Asp Val Leu Thr Ala 
210 215 220 

Ala He Gly Pro Ala Asp He Pro Leu Leu Val Ala Asp Leu Ala Tyr 
225 230 235 240 

Val Arg Gly Met Val Tyr Arg Gin Leu His Glu Glu Asp Lys Ala Gin 
245 250 255 

He Trp Leu Ser Lys Ala Thr He Asn Gly Val Leu Thr Asp Ala Ala 
260 265 270 

Lys Glu Ala Leu Ala Asp Pro Asn Leu Arg Leu He Val Thr Asp Glu 
275 280 285 

Arq Thr He Ala Ser Arg Ser Asp Arg Trp Asp Ala Ser Thr Ala Lys 
290 295 300 

Ser Arg Asp Gin Leu Asp Asp Asp Asn Ala Ala Gin Arg Arg Gly Glu 
305 310 315 320 

Leu Leu Ala Glu Gly Arg Glu Leu Leu Ala Lys Gin Val Gly Leu Ala 
325 330 335 

Ala Val Lys Gin Ala Val Ser Ala Leu Glu Asp Gin Leu Glu Val Arg 
340 345 350 

Met Met Arg Leu Glu His Gly Leu Pro Val Glu Gly Gin Thr Asn His 
355 360 365 

Met Leu Leu Val Gly Pro Pro Gly Thr Gly Lys Thr Thr Thr Ala Glu 
370 375 380 

Ala Leu Gly Lys He Tyr Ala Gly Met Gly He Val Arg His Pro Glu 



# 6* 
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385 390 395 400 

He Arg Glu Val Arg Arg Ser Asp Phe Cys Gly His Tyr He Gly Glu 
405 410 415 

Ser Gly Pro Lys Thr Asn Glu Leu He Glu Lys Ser Leu Gly Arg He 
420 425 430 

He Phe Met Asp Glu Phe Tyr Ser Leu He Glu Arg His Gin Asp Gly 
435 440 445 

Thr Pro Asp Met He Gly Met Glu Ala Val Asn Gin Leu Leu Val Gin 
450 455 460 

Leu Glu Thr His Arg Phe Asp Phe Cys Phe He Gly Ala Gly Tyr Glu 
465 470 475 480 

Asp Gin Val Asp Glu Phe Leu Thr Val Asn Pro Gly Leu Ala Gly Arg 
485 490 495 

Phe Asn Arg Lys Leu Arg Phe Glu Ser Tyr Ser Pro Val Glu He Val 
500 505 510 

Glu He Gly His Arg Tyr Ala Thr Pro Arg Ala Ser Gin Leu Asp Asp 
515 520 525 

Ala Ala Arg Glu Val Phe Leu Asp Ala Val Thr Thr He Arg Asn Tyr 
530 ~ 535 540 

Thr Thr Pro Ser Gly Gin His Gly He Asp Ala Met Gin Asn Gly Arg 
545 550 555 560 

Phe Ala Arg Asn Val He Glu Arg Ala Glu Gly Phe Arg Asp Thr Arg 
565 570 575 

Val Val Ala Gin Lys Arg Ala Gly Gin Pro Val Ser Val Gin Asp Leu 
580 585 590 

Gin He He Thr Ala Thr Asp He Asp Ala Ala He Arg Ser Val Cys 
595 600 605 

Ser Asp Asn Arg Asp Met Ala Ala He Val Trp 
610 615 

<210> 75 
<211> 537 
<212> PRT 

<213> Mycobacterium tuberculosis 
<220> 

<223> Rv3885c - possible conserved membrane protein 

Leu°Th^ 5 Ser Lys Leu Thr Gly Phe Ser Pro Arg Ser Ala Arg Arg Val 
! * 5 10 15 

Ala Gly Val Trp Thr Val Phe Val Leu Ala Ser Ala Gly Trp Ala Leu 
20 25 30 

Gly Gly Gin Leu Gly Ala Val Met Ala Val Val Val Gly Val Ala Leu 
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35 40 45 

Val Phe Val Gin Trp Trp Gly Gin Pro Ala Trp Ser Trp Ala Val Leu 
50 55 60 

Gly Leu Arg Gly Arg Arg Pro Val Lys Trp Asn Asp Pro lie Thr Leu 
65 " 70 75 80 

Ala Asn Asn Arg Ser Gly Gly Gly Val Arg Val Gin Asp Gly Val Ala 
85 90 95 

Val Val Ala Val Gin Leu Leu Gly Arg Ala His Arg Ala Thr Thr Val 
100 105 110 

Thr Gly Ser Val Thr Val Glu Ser Asp Asn Val He Asp Val Val Glu 
115 120 125 

Leu Ala Pro Leu Leu Arg His Pro Leu Asp Leu Glu Leu Asp Ser He 
130 135 140 

Ser Val Val Thr Phe Gly Ser Arg Thr Gly Thr Val Gly Asp Tyr Pro 
145 150 155 160 

Arg Val Tyr Asp Ala Glu He Gly Thr Pro Pro Tyr Ala Gly Arg Arg 
165 170 175 

Glu Thr Trp Leu He Met Arg Leu Pro Val He Gly Asn Thr Gin Ala 
180 185 190 

Leu Arg Trp Arg Thr Ser Val Gly Ala Ala Ala He Ser Val Ala Gin 
195 ~ 200 205 

Arg Val Ala Ser Ser Leu Arg Cys Gin Gly Leu Arg Ala Lys Leu Ala 
210 215 220 

Thr Ala Thr Asp Leu Ala Glu Leu Asp Arg Arg Leu Gly Ser Asp Ala 
225 230 235 240 

Val Ala Gly Ser Ala Gin Arg Trp Lys Ala He Arg Gly Glu Ala Gly 
245 250 255 

Trp Met Thr Thr Tyr Ala Tyr Pro Ala Glu Ala He Ser Ser Arg Val 
260 265 270 

Leu Ser Gin Ala Trp Thr Leu Arg Ala Asp Glu Val He Gin Asn Val 
275 280 285 

Thr Val Tyr Pro Asp Ala Thr Cys Thr Ala Thr He Thr Val Arg Thr 
290 295 300 

Pro Thr Pro Ala Pro Thr Pro Pro Ser Val He Leu Arg Arg Leu Asn 
305 310 315 320 

Gly Glu Gin Ala Ala Ala Ala Ala Ala Asn Met Cys Gly Pro Arg Pro 
325 330 335 

His Leu Arg Gly Gin Arg Arg Cys Pro Leu Pro Ala Gin Leu Val Thr 
340 345 350 

Glu He Gly Pro Ser Gly Val Leu He Gly Lys Leu Ser Asn Gly Asp 
355 360 365 
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Arg Leu Met lie Pro Val Thr Asp Ala Gly Glu Leu Ser Arg Val Phe 
370 375 380 

Val Ala Ala Asp Asp Tlir lie Ala Lys Arg lie Val lie Arg Val Val 
385 390 395 400 

Gly Ala Gly Glu Arg Val Cys Val His Thr Arg Asp Gin Glu Arg Trp 
405 410 415 

Ala Ser Val Arg Met Pro Gin Leu Ser lie Val Gly Thr Pro Arg Pro 
420 425 430 

Ala Pro Arg Thr Thr Val Gly Val Val Glu Tyr Val Arg Arg Arg Lys 
435 440 445 

Asn Gly Asp Asp Gly Lys Ser Glu Gly Ser Gly Val Asp Val Ala lie 
450 455 460 

Ser Pro Thr Pro Arg Pro Ala Ser Val He Thr He Ala Arg Pro Gly 
465 " 470 475 480 

Thr Ser Leu Ser Glu Ser Asp Arg His Gly Phe Glu Val Thr He Glu 
485 490 495 

Gin He Asp Arg Ala Thr Val Lys Val Gly Ala Ala Gly Gin Asn Trp 
500 505 510 

Leu Val Glu Met Glu Met Phe Arg Ala Glu Asn Arg Tyr Val Ser Leu 
515 520 525 

Glu Pro Val Thr Met Ser He Gly Arg 
530 535 



